€ . : : : 
Vol. 45, Parts 1 and 2 June 1958 PR 


UNIVERSITY 
OF MICHIGAN 


BIOMETRIKA “2® 


SCIENCE 
LIBRARY 


FOUNDED BY 


W. F. R. WELDON, FRANCIS GALTON ann KARL PEARSON 


MANAGING EDITOR 


EK. S. PEARSON 


ASSOCIATE EDITOR 
M. G. KENDALL 


ISSUED BY 
THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 8 June 1958) 








This volume of Biometrika is published with the co-operation of 


F. N. DAVID N. L. JOHNSON 
J. DURBIN D. G. KENDALL 
J. B. S. HALDANE D. V. LINDLEY 
M. J. R. HEALY R. L. PLACKETT 


A volume containing about 500 pages will be published annually in two half-yearly issues, 
appearing in June and December. 

Papers for publication should be sent either to 

PROFESSOR E. S. PEARSON 
Department of Statistics, University College, London, W.C. 1 
or if more convenient to 
PROFESSOR M. G. KENDALL 
London School of Economics, Houghton Street, London, W.C. 2 

It is a condition of publication in Biometrika that the paper shall not already have been 

issued elsewhere, and will not be reprinted without leave of the Editors. 


Contributors receive 25 copies of their papers free. Joint authors 15 copies each. Order 
forms for separates are sent to authors with proofs of their papers. 


SUBSCRIPTIONS 
The subscription price, payable in advance, is: 
£2. 148. (or $8.00) net per volume including packing and postage 


BACK ISSUES 
Volumes 20-44 
These may be obtained from the BIOMETRIKA OFFICE, at the following prices: 
Vols. 20-39: £5. 5s. (or $15.00) per volume; Vols. 40-44: £3. 10s. (or $10.00) per volume 
including packing and postage 
Bound volumes: £1 (or $3.00) extra per volume; Binding cases: 10s. (or $1.50) each 
Cheques should be made payable to Biometrika, crossed ‘a/c BIOMETRIKA TRUST’ and 
sent to THE SECRETARY, BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, GOWER STREET, 


LONDON, W.C. I, to whom all orders for series, single copies and offprints should be 
addressed. 








Volumes 1-19 


Permission for reprinting has been granted to Messrs Wm. Dawson & Sons 
Ltd. Volumes 1-19 are now ready for distribution, price £130 bound. 
Would librarians and others wishing to have copies please piace their orders 
now with: 














Wm. DAWSON & SONS LTD., 
| 16 WEST STREET, FARNHAM, SURREY, ENGLAND | 








ss 




















NU 


fairly 
whicl 
the s1 
totic 

of ob 


- widel 


W: 
may 
for g 
paral 
that 

Th 
appr: 
in fi: 
estin 
usua 

At 
knov 
For 


exac 











NUMERICAL STUDIES IN THE SEQUENTIAL ESTIMATION OF 
A BINOMIAL PARAMETER 


By P. ARMITAGE 


Statistical Research Unit of the Medical Research Council, 
London School of Hygiene and Tropical Medicine* 


1. INTRODUCTION 


1-1. The literature on sequential estimation procedures is rather scanty. Some authors 
such as Haldane (1945) have investigated the problem of estimation when sampling is 
conducted according to particular stopping rules, and Girshick, Mosteller & Savage (1946) 
have given a general method for obtaining unbiased estimates of a binomial parameter for 
fairly general stopping rules. Anscombe (1953) has proposed certain stopping rules for 
which fixed-sample-size formulae are asymptotically valid, and Ray (1957) has investigated 
the small-sample properties of some similar procedures. Cox (1952) has considered asymp- 
totic properties of sequential estimation procedures for general boundaries. The problem 
of obtaining exact estimation methods for general stopping rules, or even for the most 
- widely known system of sequential designs—that due to Wald—s still unsolved. 

Wald’s system was not intended primarily as a means of estimation. Nevertheless, it 
may frequently happen that an investigator chooses a Wald-type procedure, or some other, 
for good reasons, and then wishes to carry out some sort of estimation procedure for the 
parameter. How misleading will it be to use fixed-sample-size formulae, in spite of the fact 
that the observations were made sequentially ? 

The unbiased estimator given by Girshick et al. (1946) may, in particular instances, differ 
appreciably from that of the maximum likelihood estimator which would normaliy be used 
in fixed-size sampling. But unbiasedness is not a property necessarily required of an 
estimator, and it would be interesting to know what degree of bias is associated with the 
usual estimator. 

An analytical approach to this problem does not appear to be simple, since it involves a 
knowledge of sample-space distributions which is not in general available in a simple form. 
For binomial sampling with any particular boundaries these distributions can be computed 
exactly, though perhaps laboriously, and hence many questions like those discussed above 
can be answered unambiguously. In this paper results are presented for three particular 
designs: two truncated Wald schemes and a ‘restricted’ procedure of the type described by 
Armitage (1957). The extent to which one can generalize from a sample of three is, of course, 
debatable, but the results for the three procedures have some features in common, which 
are discussed in the final section of the paper. 

For each design, two questions are investigated: the distributions of the maximum 
likelihood estimator and of the unbiased estimator, for various values of the parameter; 
and secondly, the establishment of confidence intervals for the unknown parameter. These 
two aspects of the investigation are discussed separately in the following sections. 


* This work was completed, and the paper written, while the author was a visiti:.2 scientist at the 
National Institutes of Health, Bethesda, Maryland, U.S.A. 


: Biom. 45 







© S/3.4 





2 Sequential estimation of a binomial parameter 


1-2. Bias. If r successes have been observed out of n binomial trials, the maximum 
likelihood estimator of the unknown probability of a success, 0, is 6= r/n, irrespective of 
the stopping rules under which the observations have been obtained. For fixed-sizesampling, 
6 is unbiased. For any particular sequential stopping rule, 6 may be expected in general 
to be biased. A procedure for obtaining an unbiased estimator, for a wide class of sequential 
stopping rules, has been given by Girshick et al. (1946). This is given by 6, = N’/N, where 
N is the number of permissible orders (i.e. those not prohibited by a stopping rule) in which 
the observed results could have been obtained, and N’ is the number of permissible orders 
subject to the restriction that the first observation is a success. 

For particular sequential boundaries, N’ and N can be obtained by enumeration of the 
number of admissible paths from the points (1, 1) and (0,0) to the point (n,7r) on a lattice 
diagram in which the numbers of trials and successes are plotted as abscissa and ordinate, 
respectively. For designs with linear boundaries, the enumeration can conveniently be 
effected by matrix multiplication (Stockman & Armitage, 1946). 

Although 9, is unbiased, its distribution may conceivably have features making it less 
satisfactory than the biased maximum likelihood estimator, 9. The distributions of 9 and 6, 
have been obtained by calculation of the exact probability of reaching each boundary point, 
for various values of 0. For the boundary point (9, r,), this probability is N6"(1—@)"~, 
Probabilities were calculated to eight decimal places. The computations were to some 
extent recursive and were subject to cumulative errors, but the totals over all boundary 
points differed from unity by at most 1 unit in the seventh decimal place in Examples 1 
and 3, and at most 5 in the fifth place in Example 2 


1-3. Confidence intervals. If ry successes out of n, are observed in fixed-size sampling, 
a central confidence interval with confidence coefficient 1—2y is customarily obtained as 
(0, @), where P(r >r| 0) = P(r <r, | 6) = y. In repeated sampling from a population with 
any value 0, the probability that the statement ‘@ <0 <0’ is true is at least 1—2y. 

In repeated sampling according to any prescribed sequential stopping rule, the pro- 
bability statement made in the last sentence is no longer necessarily true. Even if, for some 
particular 0, the probability that @ is included in the interval is greater than the nominal 
1—2y, its value will in general differ from that given by fixed-size sampling. The first 
question, then, is: what is the probability that the ‘classical’ confidence interval includes 0, 
for repeated sampling under these sequential stopping rules? Once the probability dis- 
tributions over the boundary points have been calculated, as indicated in § 1-2, this question 
can easily be answered, for the classical confidence interval can be obtained by interpolation 
in tables of the cumulative binomial distribution (National Bureau of Standards, 1949; 
Romig, 1953). 

The second question in this connexion is how to set up, for a given sequential design, a 
system of confidence intervals satisfying the required probability statement in repetitions 
of the sequential sampling. This can be done as follows. In any sequential binomial pro- 
cedure with fixed boundary points, the latter may be ordered in terms of increasing 6, 
points with identical values of 9 being arranged in some arbitrary order. For some designs 
this ordering will correspond to a natural ordering in terms of proximity in the (n,r) plane. 
For other designs an ordering in terms of proximity may involve slight departures from one 
in terms of increasing 6. For any two boundary points, B, and B,, we shall write B, > B, 
or B, < B, according as B, follows or precedes B,, respectively, in an ordering which corre- 
sponds to increasing 6 (apart from possible slight departures of te type referred to above). 





He 


num 
ve of 
ling, 
1eral 
otial 
here 
hich 
‘ders 


f the 
ttice 
late, 
y be 


less 
id 0, 
int, 
ro", 
ome 


lary 
les 1 


ling, 
d as 
with 


pro- 
ome 
‘inal 
first 
es 0, 
dis- 
tion 
tion 
949; 


m, & 
ions 
pro- 

A 
g 9, 
igns 
ane. 
one 
> By 
TTe- 
ve). 





P. ARMITAGE 3 


Then, the sequential confidence interval for 0 with coefficient 1—2y, at any point Bo, is 
defined as (0’,0’), where P(B> B, | 0’) = P(B< B,| 0’) = y. That this system satisfies the 
required probability statement clearly follows by an argument analogous to that used in 
deriving the classical interval, provided that the distribution function over the boundary 
points is a monotonic function of 6. The latter condition will not necessarily be fulfilled for 
any ordering of the type described above, but it appears to be fulfilled in the particular 
instances considered below. A similar method may be followed to obtain sequential con- 
fidence intervals for other than binomial sampling, but only the binomial case will be 
considered in detail here. 

Sufficient conditions for the monotonicity of the distribution function with regard to 0 
(and hence for the existence of a set of confidence intervals) are that (a) the boundary 
points are ordered in terms of 6; and (b) the probability is unity that sampling will end at 
one of the boundary points, for all 0. 

For, let the boundary points be (n,,7;) (i = 1, 2,...,&), define 6, = r,/n, and denote the 
values of N by N;. If (a) is satisfied, we have 


6;<6,4, (¢=1,2,...,k-1). (1) 
The distribution function P(s,@) will be defined as the sum of the probabilities, given 0, 
of reaching the points (n;,7;) fort = 1, 2,...,s. Then, if (b) is satisfied, P(s, 0) can be written 
in either of two forms (for s = 1, 2,...,4—1) 


P => N,or(1—0)"-%, (2) 

i=1 

k 
=1— ¥ Nor(1—yn-ri, (3) 
i=s+1 
Hence aP/20 = ¥ N,O"-(1—0)'-7-(r, — 0,8), (4) 
4=1 
k 

=- > N,6r-(1-6)"-7 (r, — 10,9). (5) 

i=s+1 


By (1) and (4), for 0, <@< 1, @P/20 <0; by (1) and (5), for0<@< 6,1, 0P/20 < 0. This proves 
that oP/00 < 0 for all @ in (0, 1), except possibly in two situations: (i) for 0 = 6, in the case 
6, = 6,,,; but in this case @P/20 <0, the equality holding only if 4, is constant for all i, 
and this possibility is excluded by condition (b); (ii) for s = k; here P(s, 0) = 1 and 0P/00 = 0; 
this exception is trivial and raises no difficulties. 

Conditions (a) and (6) are sufficient for monotonicity, but not necessary. Simple examples 
may be constructed, in which the boundary points are not ordered strictly in terms of 6, 
but for which, nevertheless, the distribution functions are monotonic. 

In the examples below, the distributions over the sequential boundary points have been 
obtained for suitable values of 0, and the limits @’ and 0’ have been obtained for each boun- 
dary point by interpolation. It is thus possible to compare the two sets of intervals, which 
for convenience we shall call ‘classical’ and ‘sequentiai’, and observe in what respects they 
differ. In addition, I have calculated for various values of 0 the probability that the true 
value is excluded from each set of intervals. This gives us some idea of the extent to which 
we may be misled, in using classical limits, when the sampling is sequential. 

Interpolation in P with respect to 0 can conveniently be done after transformation of 
both P and @ to equivalent normal deviates. Linear interpolation on the transformed 


I-2 








t Sequential estimation of a binomial parameter 


variables has usually been found to be reliable. Low values of @ and 6’, or high values of 
6 and 6’, which occur at a few of the extreme boundary points, could not be obtained by 
interpolation, and were calculated to the required accuracy by exact or iterative solution 
of the appropriate algebraic equation. The tabulated values of the confidence intervals, 
and those of the unbiased estimates, may occasionally be inaccurate by one unit in the last 
digit quoted, owing to rounding-off errors. 


2. EXAMPLE 1 


Consider a Wald probability ratio sequential test, designed to distinguish between two 
values of a binomial parameter: 0, = 0-50 (with a probability of error of the first kind, 
a = 0-025), and 6, = 0-92 (with a probability of error of the second kind, f = 0-05). The two 
linear boundaries have equations 

r= 1-489+0-750n, 


and r = —1-216+0-750n. 


For convenience we shall round off the coefficients, and consider the closely similar pair of 


boundaries with equations r= 1-50+0-75n, 


and r = —1-25+0-75n. 


The maximum value of the average sample number is about 10 (according to the usual 
approximating formula) and we might expect the operating characteristic to be affected 
to a fairly small degree if the procedure is truncated at a sample size of about 30 or more. 
We have truncated at n = 40, and the two boundary points with n = 40 have been allocated 
one to each boundary. The co-ordinates of the boundary points are shown in the first two 
columns of Table 1. The points are arranged, for convenience, in order of increasing or 
decreasing n on each boundary, rather than increasing 6. The boundaries are illustrated 
in Fig. 1. 

The third and fourth columns of Table 1 show the maximum likelihood and unbiased 
estimates of 0, respectively. An unexpected feature is the rapidity with which the unbiased 
estimator approaches a value of about 0-80 as n increases; for all n> 15, 6, lies between 
0-799 and 0-801. Table 2 shows the mean, variance, and mean-square error about 0, of the 
two estimates, for various values of 0. Notice first that is biased towards 0 or 1, away from 
an intermediate value between 0-8 and 0-9. This provides a reason for the phenomenon 
noticed above, that b, differs from 4 in being brought closer to a value of about 0-80. A con- 
sequence of this situation is that for 9 = 0-7, 0-8 and 0-9 (for which values there is a relatively 
high chance of terminating at high values of n, where 6, is fairly constant), the variance of 
6, is less than that of . For more extreme values of 0 (0-60 and less, or 0-95 and greater), 
the variance of b, is greater than that of 0, and (except for 0 = 0-6) greater even than the 
mean-square error of 6. 

Girshick et al. (1946) point out that if the region inside the boundaries has a narrow throat, 
with only one accessible point for some value of n, then b, will be constant for all boundary 
points with higher values of n. In the present example the width of the accessible region 
(for n < 38) varies between 2 and 3 points, but apparently the region is sufficiently narrow 
to produce a similar effect. The unbiased estimator, incidentally, is unique since the region 
is ‘simple’, in the sense of Girshick et al. (1946). 





stan tetees wh aeeemaeae 


Al, 


s of 
1 by 
tion 
vals, 
last 


two 
ind, 
two 


ir of 


sual 
sted 
ore. 
ited 
two 
Z or 
uted 


sed 
used 
reen 
the 
rom 
non 
on- 
rely 
e of 
er), 
the 


at, 
ary 
ion 
row 
‘ion 





P. ARMITAGE 5 


For calculation of confidence limits, the boundary points have been arranged, for con- 
venience, in the order shown in Table 1, which coincides with the ordering in terms of 
increasing 6, but differs slightly from that in terms of increasing 6. The probability dis- 
tributions over the boundary points were calculated for 6 = 0-30 (0-05) 0-95, and for the 
additional values 0-475, 0-525, 0-975. The two sets of confidence limits, ‘classical’ and 
‘sequential’, were calculated as described above, for confidence coefficients of 0-90 and 
0-95. These are given in Table 1, and for the latter, visually in Fig. 2. The sequential limits 
tend to be wider apart than the classical limits, for the higher values of n. For lower values 
of n, both the upper and the lower sequential limits are displaced relative to the classical 
limits, being higher for low values of 6 and lower for high values of 6. In other words, they 
are displaced towards the fairly constant values which they assume at high values of n. 



























































10 T T T T T . 7 
ants 4 
3 id 
% 08 5 
3 
« 30 Sa a F 
“ > 06 ' i Sean eee 4 
2 a § aes 
§ 20 Z, = 5 04 : 
2 le S| 
S a 
5 40 PA 02 : 
oO 
= JA i“ | 
> l¢Z 1 1 1 1 i 1 = 
Zz ; wD os 2 Dw 
0 10 20 30 40 Lower boundary Upper boundary 
Number of trials, n Sample number at boundary point 
Fig. 1. Boundaries for Example 1. Fig. 2. 95 % confidence limits for 0 (Example 1). 
——,, sequential limits; -——-—, classical limits; 


—-—-—, maximum likelihood estimate. 


The probabilities of exclusion of the true value, 0, from each set of intervals, are given in 
Table 2, for various values of #0. We encounter here two difficulties familiar in problems of 
confidence intervals with discrete distributions: the probabilities of exclusion above 6’ or 
below @’ are in general less than the nominal value y; and exclusion above 0’, or below 6’, 
is impossible for sufficiently low, or high, values of 0, respectively. However, Table 2 shows 
that the probability of exclusion beyond one of the classical limits can be greater than the 
nominal value y, at least for values of # near the middle of the range; and that it tends to 
be considerably less than y for extremely high or extremely low values of 0. These results 
are in line with the general nature of the differences between the two set of intervals, com- 
mented on above. 

It may be of interest to note that the exact probabilities of reaching the upper boundary, 
for 0 = 0-5 and 0-925 (which is close to the value of 0-92, initially considered) are respectively 
0-0250 (agreeing well with the nominal value of 0-025), and 0-9729 (as compared with the 
nominal 0-95). 

3. EXAMPLE 2 


The design used here was intended as a truncated version of a probability ratio sequential 
test to distinguish between the hypotheses that 0 = 0) = 0-05 and 0 = @, = 0-15, with equal 
probabilities of error of the first and second kinds (a = £ = 0-05). The scheme is truncated 
at n = 70. After rounding off the coefficients, the equations of the boundaries were taken 











6 Sequential estimation of a binomial parameter 


to be r = 2-5+0-10n and r = — 2-5+0-10n. The rounded-off values are actually fairly close 
to those appropriate for 0, = 0-05, 0, = 0-17,« = # = 0-03. The co-ordinates of the boundary 
points are given in Table 4, and the boundaries are illustrated in Fig. 3. 

In this example, no calculations of unbiased estimates were made. For calculation 
of 95°% confidence limits (« = 0-025), the boundary points were arranged in the order 
shown in Table 3, which again dit »*: to some extent from the ordering in terms of in- 
creasing 6. The probability distribucions over the boundary points were calculated for 
@ = 0-03 (0-02) 0-21, and, on the upper boundary only, for 0 = 0-3(0-1)0-8. The two sets 
of limits are given in Table 4 and are shown in Fig. 4. They are considerably less discrepant 
than was the case in Example 1. There is no marked tendency for the sequential limits to 
be wider apart than the classical limits, for large n. There is, however, the same relative 
displacement noted in Example 1, but to a much smaller degree; that is, the sequential 
limits are usually higher than the classical limits on the lower boundary, and lower on the 
upper boundary. Table 3 shows that for some values of @ the probability that the interval 
excludes @, in a specified direction, may be appreciably higher than the specified upper 
bound of 0-025. 















































10 T T ae ca T T i 
3 ‘| 
2 O8F 7 
vo 
g | 
a 
‘5 O6r 
c 
2 
= o4t 
a 
° 
20 a 
sta 02+ 
53 
2 3 10 a ‘ | ‘ j “| eaplanssceccstecces : 1 ~ ‘ 1 L i > 
38 P a soopaaano-2294 20. 40 «4607060 40 20 
”“ | ee ee a eel Saae 
%—710 20-390 30 0 Loner Seundery > ehenenid 
Number of trials, n Sample number at boundary point 
Fig. 3. Boundaries for Example 2. Fig. 4. 95 % confidence limits for 6 (Example 2). 





, Sequential limits; -—--—, classical limits; 
—-+—-—, maximum likelihood estimate. 


Table 3 also gives the mean and variance of the maximum likelihood estimator, 9, for 
the same values of 0. The bias in 6 is of the same character as that described in Example 1: 
for the lower values of 0, 6 is biased downwards, while for the higher values the bias is 
upwards. 

We note finally that the exact probabilities of reaching the upper boundary, for 0 = 0-05 
and 0-17, are respectively 0-031 (corresponding to the nominal value of 0-03), and 0-922 
(corresponding to the nominal value of 0-97). The approximate formulae are apparently 
rather misleading here. 

4, EXAMPLE 3 


This is one of the ‘restricted sequential procedures’ described by Armitage (1957). The 
maximum number of observations is 44. On the upper boundary the ratio of the likelihoods 
of 0 = 0-8 and 6 = 0-5 is constant; on the lower boundary the likelihood ratio of @ = 0-5 
to = 0-2 is constant. The procedure has the properties that when 0 = 0-5 the probabilities 
of reaching each of the two outer boundaries are 0-020; and when @ = 0-8 (0-2) the pro- 














P. ARMITAGE 7 


bability of reaching the upper (lower) boundary is 0-947. The boundary points are given in 
Table 5 and illustrated in Fig. 5 (cf. also Fig. 2 of Armitage (1957), where different co- 
ordinates are used). Since the design is symmetrical, only the boundary points with r < 4n 
are tabulated. Thus, the upper portion of the middle boundary consists of the points 
n = 27,28, ...,44; r = n—13. The upper boundary consists of the points n = 8,11, ..., 44; 
r = (8+ 2n)/3. 

The unbiased estimates are displaced, relative to the maximum likelihood estimates, 
towards a value of about 0-375. Similarly, for the boundary points not shown in Table 5, 
the displacement is towards a value of 0-625. Table 6 shows that the maximum likelihood 
estimator, 6, is biased away from the value 0-5, at least at those values of 0 for which com- 
putations have been carried out. The unbiased estimator, 6, has larger variance and larger 
mean-square error than 9, for a range of values of furthest from 0-5, namely, 0-2 or less, 
and 0-8 or greater. 
























































10 
$ ost 
S 
40 S 
2 06F 
< ro) 
sof —1- A 5 
3 ‘4 = O4b 
g 2 
2 20 Wt 2 
6 ra . 4 * 02+ 
8 Lee jo 
f | Lower Middle Upper 
0 10 20 30 40 ~ 50 boundary boundary boundary 
Number of trials, n Sample number at boundary point 
Fig. 5. Boundaries for Example 3. Fig. 6. 95 % confidence limits for 0 (Example 3). 
, sequential limits; -——-, classical limits; 


—-—-—, maximum likelihood estimate. 


The region of accessible points in this restricted procedure is not ‘simple’ in the sense of 
Girshick et al. (1946) and the unbiased estimator is therefore not unique. However, alter- 
native unbiased estimators differ only in the values assigned to the three central boundary 
points of the middle boundary. 

The two sets of 95 % confidence limits are given in Table 5 and illustrated in Fig. 6. As in 
Example 2, the widths of the two intervals at any boundary point are fairly similar. As in 
the previous examples the sequential interval tends to be displaced upwards, relative to the 
classical interval, at low values of 6. For points on the middle boundary with 0-35 < b< 0-50, 
on the other hand, the mid-point of the sequential interval is displaced away from 0-5, 
relative to that of the classical interval. The probabilities of exclusion are given in Table 6. 
For the classical intervals these probabilities just exceed the nominal value of 0-025, when 
6 = 0-5, but otherwise remain less than or equal to the probabilities for the sequential 
limits. 

5. Discussion 
Examples 1 and 2 both involve parallel-line boundaries. In each example two features 
were observed: for extreme values of 0, the maximum likelihood estimator 9 is biased (in a 
more extreme direction); and for boundary points corresponding to extreme values of 6 the 








8 Sequential estimation of a binomial parameter 


sequential confidence limits are displaced in relation to the classical limits (in a less 
extreme direction). For sufficiently high or sufficiently low values of 0, the probability of 
reaching one boundary is much higher than that of reaching the other, and (as Cox (1952) 
remarks) the properties of the double-boundary procedure should be similar to those of 
a single-boundary procedure. 

Cox (1952) has shown that in binomial sampling with a single linear boundary, 6 is 
asymptotically biased in the direction observed above. That is, if the equation to the 
boundary is r = a+bn (a>0,b>0), then 6 is biased upwards, for 0>b. This result may be 
obtained also by considering the appropriate diffusion approximation. 

As was pointed out in § 2, a natural consequence of the direction of the bias in 4 is that 
6, tends to be displaced towards a central value. For values of @ sufficiently close to this 
central value, 6, tends to have a smaller mean-square error than has 6. The values of 0 for 
which the reverse situation holds are all sufficiently near to 0 or 1 to yield a high probability 
of hitting a particular one of the boundaries (one of the outer boundaries in Example 3). 
Indeed, the values of 0 for which 6 and 6, have equal mean-square errors appear to be fairly 
close (in both Examples 1 and 3) to the values of @ appearing in the specification of the 
boundaries (0-50 and 0-92 in Example 1, and 0-2 and 0-8 in Example 3); but there is no 
evident reason why this should generally be true. 

The other observation, of the direction of displacement of the confidence intervals, can 
also be verified for single-boundary sampling, by the following direct argument. Let the 
equation to the boundary be r = a+bn (a>0,b>0), and let B be the boundary point 
(N,1o), Where 6, = 1,/%. Suppose that a sample path, when continued sufficiently far, 
crosses the sequential boundary at a point where @ = 6, and the fixed-sample-size boundary 
n = n, at a point where 6 = 6,. Assume that 4 is a decreasing function of n. Then, any path 
with 0,>0, must have crossed the sequential boundary at n<m», which implies 4, > 4p. 


Hence, eeihen* ee 
P(9,> 65) > P(9;> 9), 


and P(6, > 6,) > PO, >0,). 


It follows that 0’ < @ and 6’ <@, where, as before (0,0) are the fixed-sample-size limits and 
(0’, 0’) are the sequential limits. 

These results apply similarly to the lower boundary; thus, for 0 < b, 0 is biased downwards, 
0’ >6 and @’ >8@. 

In Example 1, but not appreciably in Example 2, the sequential limits are wider apart 
than the classical limits, for boundary points corresponding to high values of n. There is 
reason to believe that this will be a general finding if truncation is performed at sufficiently 
high values of n. For, consider a Wald procedure to distinguish between the hypotheses 
that 0 = 0, and 0 = 6,, with both probabilities of error equal to y. At points infinitely far 
up either boundary the 100(1 — 2y) % sequential confidence interval will be (45, 0,), whereas 
the width of the classical interval will tend to zero as noo. It seems a reasonable con- 
jecture that the probability of exclusion from the classical interval will exceed that from the 
sequential interval when @ is close to b. The position is somewhat obscured in our examples 
by the effects of discontinuity, but what evidence there is confirms the conjecture. 

It should be noted that sets of central confidence intervals exist, other than those given 
by the method described in this paper, and some of these may more closely resemble the 
classical limits than do those considered here. 








P. ARMITAGE 9 


Example 3 differs from Examples 1 and 2 in having a design with two ‘channels’, which 
rather confuses the problem for intermediate values of 6. However, for extremely high (or 
extremely low) values of 0, only the upper (or lower) boundary need be considered, and the 
situation is similar to that discussed in relation to the first two examples. We find that 6 
is biased away from 0-5, and the sequential confidence limits are displaced towards 0:5. 
The close similarity between the two systems of limits in this example is interesting. It 
may be a reflexion of the fact that for all except very extreme values of 0 there is a fairly 
high probability of reaching the middle boundary, where the variability in n is relatively 
small. This type of procedure was developed in an attempt to avoid the very high variability 
in sample number which is associated with parallel-line procedures. 


SUMMARY 


A method is described of obtaining confidence limits for a binomial probability, for a class 
of sequential procedures with fixed boundaries. Confidence limits, and unbiased estimates 
of the parameter, have been calculated for all the boundary points in three closed sequential 
designs: two truncated Wald procedures, and one ‘restricted’ procedure. Reasons are 
suggested for two observed tendencies: at boundary points which are reached after a small 
number of observations, corresponding to high or low values of the estimated probability, 
the sequential confidence limits are shifted in a less extreme direction relative to the limits 
given by the usual fixed-sample-size formulae; for extreme values of the parameter the 
maximum likelihood estimator is biased in a more extreme direction, and the unbiased 
estimator is correspondingly shifted in a less extreme direction. If limits are based on 
fixed-sample-size formulae, the probability of exclusion of the true value does not appear 
to differ grossly from the nominal value. The ‘classical’ and sequential limits are particularly 
close for the ‘restricted’ design. 


I am indebted to Miss Irene Allen for computational assistance. 


REFERENCES 


AnscoMBE, F. J. (1953). Sequential estimation. J.R. Statist. Soc. Ser. B, 15, 1-29. 

ARMITAGE, P. (1957). Restricted sequential procedures. Biometrika, 44, 9-26. 

Cox, D. R. (1952). A note on the sequential estimation of means. Proc. Camb. Phil. Soc. 48, 447-50. 

GirsHick, M. A., Mosretter, F. & Savaaes, L. J. (1946). Unbiased estimates for certain binomial 
sampling problems with applications. Ann. Math. Statist. 17, 13-23. 

Hatpang, J. B. S. (1945). On a method of estimating frequencies. Biometrika, 33, 222-5. 

NATIONAL BuREAU OF STANDARDS (1949). Tables of the Binomial Probability Distribution. Applied 
Mathematics, Ser. 6. Washington: Government Printing Office. 

Ray, W. D. (1957). Sequential confidence intervals for the mean of a normal population with unknown 
variance. J.R. Statist. Soc., Ser. B, 19, 133-43. 

Romie, H. G. (1953). 50-100 Binomial Tables. New York: Wiley and Sons. 

Stockman, C. M. & ArmrraGE, P. (1946). Some properties of closed sequential schemes. J.R. Statist. 
Soc. Suppl. 8, 104-12. 





| 


| 
| 


10 


Sequential estimation of a binomial parameter 


Table 1. Boundary points, maximum likelihood and unbiased estimates of 0, 
and 90 and 95% confidence limits for 0 (Example 1) 





ao 
| 





Estimates of 0 


90 % confidence limits 


95 % confidence limits 
































Maximum 
likelihood | Unbiased Classical Sequential Classical Sequential 
7) 6, 
i | 

| Lower boundary! 

- 0 0-000 0-000 — 0-78 — 0-78 ee tl 0:84 — 0-84 
: a 1 333 -500 0-02 “86 0-03 “86 0-01 91 0-01 91 
5 2 -400 -667 -08 81 14 “87 05 85 09 91 
2 -500 ‘714 “15 “85 21 +88 “12 88 16 92 
oc Coe ‘571 -750 +22 ‘87 29 89 “18 90 -23 92 
9 5 -556 ‘778 +25 83 +35 89 21 86 29 92 

10 | 6 0-600 0-786 0-30 0°85 0-38 0-90 0-26 0-88 0-33 0-92 
11 7 +636 *792 35 86 42 -90 “31 89 37 93 
13 8 *615 *796 +36 84 45 -90 +32 86 -40 93 
14 9 -643 -798 39 85 46 -90 +35 87 42 93 
15 10 -667 “799 42 86 48 -90 +38 88 44 93 
17 ll 0-647 0-799 0-42 0-83 0-50 0-90 0-38 0-86 0-46 0-93 
18 12 -667 -800 *45 “84 51 -90 41 87 -46 93 
19 13 -684 -800 47 *85 52 -90 43 88 “47 93 
21 14 -667 -800 -46 +83 52 -90 -43 85 48 93 
22 15 *682 -800 48 *84 53 -90 *45 86 -48 93 
23 16 0-696 0-800 0-50 0-85 0-53 0-90 0-47 0-87 0-49 0-93 
25 17 +680 -800 -50 +83 54 -90 -46 85 -49 93 
26 18 -692 -800 “51 +84 54 -90 -48 86 49 93 
27 19 -704 -800 53 “84 54 ‘90 +50 86 -50 93 
29 20 -690 -800 +52 83 54 -90 -49 85 -50 93 
30 21 0-700 0-800 0-54 0-83 0-54 0-90 0-51 0-85 0-50 0-93 
31 22 ‘710 -800 55 “84 55 -90 *52 86 -50 93 
33 23 -697 -800 -54 “83 55 -90 “51 84 -50 93 
34 24 -706 “800 55 +83 *55 -90 +52 85 -50 93 
35 25 -714 -800 56 +84 55 -90 54 85 -50 93 
36 26 0-722 0-800 0-57 0-84 0-55 0-90 0-55 0-86 0-50 0-93 
37 27 -730 -800 58 *85 55 “90 +56 86 -50 93 
38 28 *737 -800 “59 *85 “55 “90 “57 87 -50 93 
39 29 *744 -800 -60 *85 *55 -90 58 87 -50 93 
40 30 -750 -800 “61 86 55 -90 -59 87 +50 93 
| Upper boundary 
40 31 0-775 0-800 0-64 0-88 0-55 0-90 0-62 0-89 0-50 0-93 
38 30 ‘789 -800 65 “89 *55 “91 63 90 -50 93 
34 27 *794 *800 65 -90 “55 91 62 91 -50 93 
30 24 -800 -800 -64 ‘91 +55 91 61 92 “50 93 
26 21 *808 *800 64 *92 “55 91 61 94 -50 93 
| 
22 18 0-818 0-800 0-63 | 0-94 | 055 | 092 | 060 | 095 | 0-50 | 0-94 
18 15 ‘833 -801 -62 95 ‘55 | +94 “59 96 “50 95 
14 12 *857 -806 ‘61 ‘97 ‘56 | +96 ‘BT 98 -50 97 
10 9 -900 833 61 995 | -57 | -991/] -56 998 | ‘51 996 
6 6 1-000 | 1-000 61 | — 61 |, — “54 — “54 — 
L | | | 























Tah 














al 


P, ARMITAGE 


1 


1 


Table 2. Characteristics of the distributions of the maximum likelihood estimator 6, and the 
unbiased estimator 0,; also the probabilities of exclusion from the confidence intervals 








































































































(Example 1) 
y | | | 
| = roteeal Probability of exclusion of 0 | 
Mean Variance square me ce 
(above upper limit or below lower limit) 
error 
| | ie se 4 
6 | 90 % limits (y = 0-05) 95 % limits (y = 0-025) | 
A | A A | A A | 13 | 
6 6, 0 6, 0 Classical | Sequential Classical Sequential | 
| | | | 
| | ra WSR Bate Bae wees Soa 
| >B | <0 | >% | <o | >0| <0 | >a | <a’ | 
a. | ieee ha: | | | 
| | | 
0-30 | 0-2017 | 0-3000 | 0-04322 | 0-09230 | 0-05288 | 0 0-015 | 0 | 0-025 | 0 0-007 | 0 0-015 | 
| -40 *2772 | -4000 | -05193 | -09922} -06701 | 0 013 | 0 | 039 | 0 -008 | 0 024 | 
“50 3631 5000 | -06018| -09586| -07891 | 0 -027 | 0 | -045 | 0 -026 | 0 -025 | 
60 -4683 | -6000| -06970| -08379| -08705 | 0 098 | 0 | -047 | 0 004 | 0 0 | 
‘70 -6055 7000 | -07656 | -06513 | -08549 | 0 0 0 | 0 0 0 0 0 | 
| | | 
} | 
0-80 | 0-7709 | 0-8000 | 0-06159 | 0-04263 | 0-06244 | 0-040 | 0 0-040 | 0 0 0 0 0 | 
90 9145) -9059| -02278| -01965| -02299| -058 | 0 047 | 0 0-030 | 0 0-010 | 0 
“95 *9642 | -9500  -00736 | -00917| -00756| -015/|0 027 | 0 -015 | 0 -015 | 0 | 
| 975 | -9836 | -9750 | 00259 | -00439 | -00266| -002 | 0 | 021 | 0 005 | 0 021 | 0 
| 2 
Table 3. Mean and variance of the distribution of 6, and the probabilities of 
exclusion from the confidence intervals (Example 2) 
le ee A | Probability of exclusion of @ 
Disizibution of | from 95 % confidence interval 
@ Classical Sequential | 
Mean | Variance | ! 
| >o | << | >o <” | 
| | | | 
al aa si SS eh a ie ae | - i 
0-03 0-0224 0-00076 0 0-014 | 0 0-014 
05 0444 oo1s7 =| 0 031 | 0 024 | 
-07 -0670 -00454 | 0 -046 0 -024 
09 0981 00755 | 0 032 | SOOO 021 
‘ll “1319 01044 | 0 024 | 0 024 
0-13 0-1654 0-01294 0 0-030 0 0-024 
“15 -1970 -01520 0-032 036 0-017 012 | 
“17 +2262 -01744 | 035 -022 -022 ‘017 
19 +2534 -01980 | 023 -024 -023 *024 





12 





Sequential estimation of a binomial parameter 


Table 4. Boundary points, maximum likelihood estimates of 0, 
and 95% confidence limits for 0 (Example 2) 





95 % confidence limits 

















n r 0 
Classical 
Lower boundary 
25 0 0-000 — 
35 1 -029 0-001 | 
45 2 -044 005 | 
55 3 055 01 | 
| 
65 4 0-062 0-02 | 
68 5 -074 -02 
69 6 -087 03 | 
70 7 -100 -04 | 
Upper boundary | 
70 8 0-114 0-05 | 
69 8 -116 05 
68 8 -118 05 
67 8 “119 05 =| 
66 s ‘121 05 | 
65 5 0-123 0-06 | 
| 8 “125 06 | 
— — -127 06 
62 8 +129 -06 
61 8 “131 06 | 
60 8 0-133 0-06 
59 8 -136 06' | 
58 8 -138 06 
57 8 -140 -06 | 
56 8 ‘1430 | 06 | 
55 8 0-146 | 0-06 | 
54 8 -148 | 07 | 
53 8 “151 | 07 
52 8 154 | 07 
51 8 | 157 07 | 
| | 
50 s | emo | oo | 
49 8 163 | 07 
48 8 | -167 | -07 
47 oY ‘170 | 08 | 
45 | 7 | 156 | 06 | 
44 | 7 | 0159 | 0-07 
43 i 163 | 07 
42 | 7 : 167 | 07 | 
41 ... ‘171 | 07 | 
40 | 7 175 | 07 
| | 


or or or 


Ko b> te te te 
9 


=~ 
lor) 


° 
< 


~I +1 


to be te we 
© © 











Sequential 
—_ 0-14 
0-001 -16 
-007 17 
‘01 17 
0-02 0-18 
-03 18 
-03 “19 
04 *20 
0-05 0-20 
05 -20 
-05 21 
-05 21 
*05 21 
0-05 0-22 
-05 +22 
*05 +23 
05 +24 
05 +25 
0-05 0-25 
*05 26 
-05 26 
-05 26 
*05 27 
0-05 0-27 
*05 27 
05 -28 
*05 *28 
05 *28 
0-05 0-28 
-05 *28 
05 | +28 
05 | +28 
05 | -29 
0-05 0-29 
05 *29 
*05 *29 
06 *29 
-06 -30 


























P. ARMITAGE 


Table 4 (continued) 











n r 0 
| 
Upper boundary 
39 7 0-179 
38 7 *184 
37 7 -189 
35 6 171 
34 6 “176 
33 6 0-182 
32 6 -188 
31 6 *194 
30 6 -200 
29 6 *207 
28 6 0-214 
27 6 *222 
25 5 -200 
24 5 -208 
23 5 +217 
22 5 0-227 
21 5 +238 
20 5 *250 
19 5 +263 
18 5 -278 
17 5 0-294 
15 4 267 
14 4 +286 
13 4 -308 
12 + +333 
ll 4 0-364 
10 + -400 
9 4 -444 
8 4 -500 
% 4 571 
5 3 0-600 
4 3 -750 
3 3 1-000 











95 % confidence limits 








Classical Sequential 
0-08 0-34 0-06 0-30 
08 34 -06 30 
08 35 -06 30 
06 +34 -06 31 
07 *35 -06 33 
0:07 0-36 0-06 0:34 
07 37 -06 35 
07 -38 -06 36 
08 -39 -06 37 
08 -40 06 37 
0-08 0-41 0-06 0-38 
09 42 -06 38 
07 41 06 39 
07 42 -06 39 
07 44 -07 40 
0-08 0-45 0-07 0-42 
08 47 -07 44 
09 49 -07 46 
09 51 -07 47 
10 54 -07 48 
0-10 0°56 0-07 0-49 
-08 55 -08 -52 
-08 58 -08 55 
09 61 -09 58 
10 65 “09 61 
0-11 0-69 0-10 0-66 
12 “74 ‘ll 70 
14 -79 12 76 
16 84 “13 80 
18 90 14 85 
0-15 0:95 0-15 0-94 
19 994 “19 992 
29 a +29 -= 




















13 





14 


Table 5. 


Sequential estimation of a binomial parameter 


Boundary points, maximum likelihood and unbiased estimates of 0, 
and 95 % confidence limits for 0 (Example 3) 

















| 
Lower boundary 





s 0 
11 1 
14 2 
17 3 
20 4 
23 5 
26 6 
29 7 
32 s 
35 9 
aS oe 
“oo 
“s i 


Lower half of 


middle boundary 





44 13 
43 13 
42 13 
41 13 
40 13 
39 13 
38 13 
37 13 
36 13 
35 13 
“uo. 2 
i 13 
| $2 13 
31 13 
30 13 
29 13 
28 13 
27 13 
26 13 





























Estimates of 0 | 95 % confidence limits 
ia a | 
Maximum 
likelihood Unbiased Classical Sequential 
0 | 6, | | 
A EA) she Ge vee PU | 
| | | 
0-000 0-000 | — | os7 | — 0-37 
091 SCS +125 0-002 ‘41 | 0-003 -43 
143. | +192 | 02 | m | -02 “45 
176 234 | 04 43 | 05 “46 
200 +263 | -06 4a -07 “47 
| 
0-217 0-284 | 0-07 0-44 | 0-09 0:48 
231 300 -09 44 | 1] 48 | 
241 312, “10 44 12 49 | 
250 323 | a: 4 - 14 49 | 
257 +331 -12 43 | 15 -49 
0-263 0-338 0-13 0-43 «| 0-16 0-49 
268 +344 14 43 17 -49 
+273 +349 “15 43 17 49 
0-295 0-349 0-17 045 | O18 0:49 
302 +349 17 46 -18 49 
310 349 18 “47 -18 49 
‘317 ‘350 “18 -48 “19 50 | 
325 | 351 | -19 -49 “19 50 
| 
0-333 | 0-354 | O19 0-50 0-19 0-51 si 
342 = | 357 -20 | 31 20 61 | 
351 | 361 -20 ‘53 20 52 | 
361 | 366 rae 54 | 21 53 
371 | 372 21 55 CO 22 54 | 
| | | 
0-382 os7o 6| «(one 6] (lose 0-22 0-55 | 
394 387 | 23 “58 +23 57 
406 -396 +24 59S 24 58 
419 407 24 | 61 25 60 
-433 418 25 | 63 26 61 | 
0-448 0-432 0-26 | 0-64 0-26 0-63 | 
464 447 27 | 66 27 65 | 
481 463 -28 68 29 67 | 
500 500 -30 70 30 70 | 




















P. ARMITAGE 


15 


Table 6. Characteristics of the distributions of 6 and 6,, and the probabilities 
of exclusion from the confidence intervals (Example 3) 









































Gia Variance uneant Probability of exclusion of 0 
sq from 95 % confidence interval 
error 
6 
Classical Sequential 
6 6, 6 6, 6 
>6 , 0 >0 < 

SS oe ko icnel Pee 
| 
| 0-05 |. 0-0368 | 0-0500 | 0-00298 | 0-00535 | 0-00316 | 0 0-007 | 0 0-007 
| 10 -0744 -1000 00531 -00936 -00596 0 -008 0 ‘017 
| “15 1131 +1500 ‘00709 -01184 -00845 0 -006 0 -019 
| +20 -1553 -2000 “00904 01303 01103 | 0 021 0 021 
| 30 +2658 +3000 -01496 01263 01613 | 0 009 | 0 -009 
| +40 -3921 -4000 01399 01119 01404 | 0-017 014 | 0-017 022 
-50 -5000 +5000 -01077 -01066 -01077 0-026 -026 | 0-023 -023 























[ 16 ] 


A STOCHASTIC MODEL FOR STUDYING THE PROPERTIES 
OF CERTAIN BIOLOGICAL SYSTEMS BY 
NUMERICAL METHODS 


By P. H. LESLIE 


Bureau of Animal Population, Department of Zoological 
Field Studies, Oxford 


CONTENTS 
PAGE 
1. Introduction 16 
2. Stochastic model 16 
3. Deterministic models 17 
4. The varieties of stochastic models 20 
5. Models in which the birth-rate remains constant 21 
6. Models in which the death-rate remains constant 23 
7. Some numerical results for a logistic process 25 
8. The chance of extinction in a logistic population 28 


1. IntTRODUCTION 


Although there is little difficulty in formulating stochastic models for two interacting species 
of living organisms, the intractability of the resulting equations from the mathematical 
point of view is a very serious obstacle to progress (Chin Long Chiang, 1954; Bartlett, 1957). 
In order to study the qualitative properties of some biological system such as that of a 
predator and prey, or that of two competing species, an approach to the problem by way 
of a set of Monte Carlo experiments may be, for the moment, more rewarding, at least until 
some of the difficulties in handling the full theoretical equations have been resolved. The 
following model is very easily adapted for use numerically on an ordinary hand machine, 
or preferably with the help of an electronic computer. It is developed here for the case of a 
single species living alone in a limited environment, namely, a logistic process; for a system 
of two competing species; and also for the predator-prey type of interaction. 


2. STOCHASTIC MODEL 


Suppose that a population of some species, which may be living alone, or interacting with 
some other species in a limited environment, consists of N, individuals at time t, and that the 
expected change in numbers during the discrete interval of time ¢ to t+ 1 can be defined in 
terms of some suitable deterministic model. If, according to this model, we expect B, 
births and D, deaths to occur during the interval, then evidently 


E(Ni1) = N+ B,—D,. (21) 
If we define E(Ni1)/N, = Az = et, (2-2) 
we have from (2-1) A, = 1+£,-4, (2-3) 


where /, = B,/N, is the birth-rate, and 6, = D,/N, the death-rate, per unit of time, expressed 
in terms of the N, individuals alive at the beginning of the interval. 





We ms 


from wh 


or 


Similarl; 


or 
so that, 


Thus 
from N 
which 1 

If we 
express 
(Kendé 


and th 


respe 
indiv: 











P. H. LEesiie 17 
We may define also an expected birth-rate b, by 


1 
B,= oN | eT dr, 
0 
from which A= s (e—1), 
t 
oF b = r,|(A,— 1). 
Similarly an expected death-rate d, may be defined by 
1 
D,= aN, | e" dr, 
0 
or d, = 6,r,{(A,—1); 
so that, using (2-3) b,—d, = "% 


Thus, working in discrete time intervals, we can regard the expected change in numbers 
from N, to N,,, as taking place through the operation of a birth-rate b, and a death-rate d,, 
which remain constant throughout the interval. 

If we consider a simple ‘birth’ and ‘death’ process, for which the constant rates are 
expressed in terms of some convenient unit of time, then according to standard theory 
(Kendall, 1949), the mean population size at time ¢+ 1, given \, individuals at time ¢, is 





E(N,41) = &® N, ) 
and the variance var(N,,) = . Se {er — ]} er-W N, (db, +d,) (2:4) 
1—% 
= 26,M, (b,=4,). 


In the case of a species living alone both 6, and d, will be some functions of N,, and if the 
form of these functions is specified, it should be possible to adapt this model for use in a 
step-by-step Monte Carlo realization of the process. Thus, in order to simplify matters, we 
might assume as an approximation that the distribution of N,,, is normal with ~ and o? 
defined by (2-4), subject to the condition that all negative values of N,,, are attributed to 
Ni = 0. (It should be noted, however, that this approximation may not be too good in the 
region of small N,.) Then, given N,, we could calculate N,,, with the help of a table of random 
normal deviates, and the process can be continued with the resulting value of N,,,. The 
same type of model is also applicable to two interacting species S, and S,, in which case the 
respective b,(t), b,(t), d,(t) and d,(t) will be some appropriate functions of the N,(t) and N,(¢) 
individuals alive in each population at time ¢. 


3. DETERMINISTIC MODELS 


In order to develop this method of approaching the problem of two interacting species, the 
deterministic models which give the expected balance of births and deaths during the in- 
. terval ¢ to ¢+1 must be expressed in the form 


N,(¢+1) = FAN), Na(H)} MO), 

N,(t +1) = FEN, (t), N2(t)} Na (t). 
The simplest equations for the three cases considered here; namely, the logistic model for 
a single species: the case of two competing species: and finally the predator-prey type of 
interaction, are as follows. 


2 Biom. 45 





18 — Stochastic model for studying biological systems by numerical methods 


(a) Single species 


The required expression is obtained very easily (Leslie, 1957) from the logistic differential 
equation aN 


where a is a positive constant, and r is the difference between a birth-rate b and a death- 
rate d, and represents the intrinsic rate of increase of the species which would only be 
approached if no limitations either of food or space were placed upon the increase in numbers. 
The familiar integral of (3-1) is 

K 


“= 1¥0e7 


(K = r/a), (3-2) 


where K is the upper asymptote in numbers and the constant C defines the initial state of 


the system. If we write Ame =er-4, 


we have from (3-2) after a little rearrangement, 


— AM 
+1 T4aM,’ (3-3) 
where the constant a = (A—1)/K. 


(6) T'wo competing species 
Suppose that if two species S, and S, were each living alone in a limited environment, 


they would increase in numbers according to a logistic equation (3-3). Then, when both are 
competing together in the same environment, we may write 





ms A, N,(t) 

siinals 1+ a, N,(t)+71N,(t)’ (3-4) 
ast ie 

MAE+N) = TaN) +740’ 


where A, and a, are the logistic parameters for the species S, when it is living alone, and 
similarly A, and «, those for the species S,; while the positive constants y, and y, express 
the magnitude of the effect which each species has on the rate of increase of the other. 

Working in discrete time intervals, the system of equations (3-4) is closely related to the 
well-known Lotka—Volterra differential equations for two competing species 


aN, 

~— = (7,—4,N, — 5, N,) N,, 

AN. (3-5) 
-_ = (7T2—a_N, —b.M,) Ng. 


For, to take the first member of (3-4) as an example, if from (3-3) we write 
a = (A,—1)/K,, 

and put V1 = ka, 

A, N,(t) 





then N,(¢+1) = 


1+{(Ay—1)/Ky} (0, () + ENO) 


a = (ra), (3-1) 








and 
ther 
A, 
in tl 


ano 
refe 
equ 














P. H. Lestir 19 


The value of the parameter A, depends on the unit of time which has been adopted. 
Suppose that for an interval of time h we have 








Ah) = Ad, 
N(t+h)—N,(t)_ (AR=1 1—{N,(t) + kN,(t)}/K 
-_ ST ta Oe ee 


and as h-0, this may be replaced by 


an N, kN, 
=) = (log, A,)N,(t) [2 ays =) 


which is of the same form as the first member of (3-5), if in the latter we put log, A, = 7, 
a, = 1,/K,, and the ratio b,/a, == k. The second member of (3-4) is related in the same way 
to the second member of (3-5). 

The properties of the system (3-4) are similar to those of (3-5). Thus, the latter will have 


a stationary state when 
‘ _ 4971 — bits 


_— = 
1A, —b,b, 





— Urban; 


* Gy — yb,” 
and if there is a solution to these equations, N, = L,, N, = L,, with both L, and L,>0, 
then, as is well known, this stationary state will be stable if a,a,>6,d,, and unstable if 
4, <b,b,. Similarly, the system (3-4) will have a stationary state when the denominators 
in the equations are equal to A, and A, respectively, or when 


N, an A(Ay ~ip+ Vilas ~%) 
Oy %e— Vi V2 
N, = &(Ag—1)— (Ay — 1) ; 
Oy %,— Vivo 
Given a solution to these equations with L,, L, > 0, this state will be stable when a, a> 7172 
and unstable when «,«.< 7,72. By a suitable choice of the parameters in (3-4) we thus can 
construct a numerical system with either a stable or an unstable stationary state. Similarly, 
for the other possibilities which arise in the case of two competing species, when in (3-6) 
one of the solutions is positive and the other negative, leading to the consequence that one 
of the species persists and the other disappears from the system. 





(3-6) 





(c) The predator-prey relationship 


If 8, is a species of prey and S, the predator, then the familiar Lotka-Volterra differential 


equations for the system are 


aN, 
SE = (1-2) M, 


aN (3-7) 
y = (-—€,+/,N,) ™, 


and for a discussion of the properties of a stochastic model based on this classical system 
reference may be made to Bartlett (1957). From the biological point of view, however, these 
equations are not entirely satisfactory as a description of the interaction. In the first place 


2-2 





| 
} 
| 


20 Stochastic model for studying biological systems by numerical methods 


no allowance is made in them for any intra-specific competition, and although it is easy to 
remedy this by inserting terms in N? and N3 in the respective equations, a more serious 
deficiency is that no upper limit to the relative rate of increase of the predator is defined in 
the second member of (3-7). An alternative set of equations (Leslie, 1948, § 6) is 

ot = (r,—-a,N,—bM)M, 

(3:8) 


It will be noted that now, if the prey becomes very numerous and N, +00, Ny 'dN,/dt—>r,, 
the intrinsic rate of increase of the predator; while conversely, when N, > 0, Ny *dN,/ dt > — oo, 
corresponding to the disappearance of the predator in the absence of any prey. 

Working in discrete time intervals, the set of equations analogous to (3-8) is 





_ AM) 
M+) = aN +7 My’ “ 
Mei+t) = — 
, 1+ ag{NQ(t)/N,(0)}’ 


where «,, @ and y, are positive constants, and log, A, = r, and log, A, = r,. This system will 

have a stationary state when 

. %(Ay — 1) 
V1(Ag—1) + tga’ 

_ Qy=1)A2-1) 
V1(Ag—1) +0,’ 





1 
(3-10) 





N, 


and an appoach to this stable state will in general be made by series of damped oscillations 
(Leslie, 1948). 


4, THE VARIETIES OF STOCHASTIC MODELS 
These deterministic models (3-3), (3-4) and (3-9), which give for each species the expected 
numbers N,,, at time ¢+1, given N,(¢) and N,(t) individuals at time ¢, express merely the 
balance between the birth-rate and the death-rate of the particular species during the 
interval. Thus, in each case we have an expression of the form 


A 
N,(t+1) =—“.N,(t) (a = 1,2), 41 
(+1) = FH NA) (a= 1,2) (41) 
where q,(t) is some function of the numbers in each population at time ¢. For each species, 
therefore, we have from (2-3) for the interval ¢ to t+ 1, 

Aalt) = Aa/dalt) = 1+ fa(t)—Sa(t) (a = 1,2). (4-2) 


But, in order to calculate the variance of the distribution from (2-4), it is necessary to specify 
both the birth-rate and the death-rate during the interval as some functions of N,(¢) and 
N,(t). 

The difficulty here is best seen by considering the logistic differential equation, 


“ - (r—aN)N. (4:3) 








This m 


where 
and th 


with b 
possib! 
(Kend 
these } 
extren 


and 


It is 
functi 
model 
is star 

It is 
(3-5) a 
An al 
imagi 
by the 
specie 
mode 


Weh 
the s] 


where 


and h 


Wen 


wher 





\ ceed 





P. H. LEstie 21 


This may also be written as aN 

! Gr = WAN)- 9), 
where the birth-rate function y(N) = b-a,N, (4-4) 
and the death-rate function P(N) =d+a,N } 


with b—d = r and a, +a, = ain (4-3). There are theoretically, therefore, a large number of 
possible stochastic models for a logistic process which have the same deterministic equivalent 
(Kendall, 1949; Bartlett, 1957). If we fix on some values of b and d, taking as usual b>d>0, 
these possibilities lie between a, = 0, a.>0, and a,>0, a, = 0 in (4-4); in cther words, the 
extreme cases for the logistic are when 


(1) w(N)=5, constant, 
d(N) =d+aN, 
and (2) ~(N)=b-aN (0<N<D/a), 
=0 (N2D/a), 
¢(N)=d, constant. 


It is to be noted in this last case that since a negative birth-rate is meaningless, this 
function is defined only for values of N lying between zero and b/a. In the deterministic 
model, however, no limitations are placed upon the initial value N) with which the system 
is started and we have, therefore, to define y(N) = 0 for N >6b/a. 

It is evident that the same arguments will apply in the case of the differential equations 
(3-5) and (3-8) for two competing species and for the predator-prey relationship, respectively. 
An almost innumerable variety of different stochastic models for these systems can be 
imagined. Nevertheless, from one point of view, these possibilities are bounded, as it were, 
by the same two extreme cases, namely, when either the birth-rate or the death-rate of each 
species remains constant. We shall consider, therefore, the numerical development of this 
model in terms of these two limits. 


5. MODELS IN WHICH THE BIRTH-RATE REMAINS CONSTANT 


We have from (4:1) and (4-2), dropping the suffix a and confining our attention to one of 
the species 


A 
saa Beat oa i (5°1) 
t 
where the constant A=e =e-4, (5-2) 


and hence, since the birth-rate is assumed to remain constant, 
log, A, = b—d, = ";. (5-3) 


We may therefore write (2-4) in the form 


E(Ni41) = AN, } (5-4) 
var (N41) = P£(N,,1), 
where ¢= (-- 1) (A,—1) (7% 9) (5:5) 


= 2b, (r, = 0). 





22 Stochastic model for studying biological systems by numerical methods 


In designing a Monte Carlo experiment based on this model it is necessary in the 
first place to decide on some suitable numerical values of A and 6 in (5-2). One way 
of proceeding would then be to tabulate the function ¢ in (5-4) over a range of pos- 
sible values of A, (or its natural logarithm r,). However, if we are free to choose any 
arbitrary values of the parameters A and 6, this expression for the variance can be 
greatly simplified. 

A particular value of A = e’-¢ is compatible with a range of possible values of 6 and d 
(6>d); but it appears that for certain combinations of A and the ratio b/d = k>1, the 
function ¢ remains comparatively stable over a wide range of positive and negative values 
of r,. Thus, the results of a rough preliminary calculation suggested that for 2-0<A< 2-5 
there would be a value of k for which ¢ remained approximately constant over a range of 
the argument r>r,> —r. For instance, the following are the values of ¢ = f(r,) for the stated 
combinations of A and k. 











A=2-0, k= 3-2 A= 2-25, k= 5-0 A= 2-5, k= 9-0 
rs ? rs ¢ v; ¢ 
0-693 1-91 0-811 1-87 0-916 1-87 
0-493 1-97 0-611 1-95 0-716 1-96 
0-293 2-00 0-411 2-00 0-516 2-02 
0-093 2-01 0-211 2-02 0-316 2-05 
0 2-02 0-111 2-02 0-116 2-06 
— 0-093 2-02 0 2-03 0 2-06 
— 0-293 2-00 —0-111 2-02 — 0-116 2-06 
— 0-493 1-98 —0-211 2-02 — 0-316 2-04 
— 0-693 1-95 —0-411 2-00 — 0-516 2-01 
— — —0-611 1-97 — 0-716 1-98 
—_— — —0-811 1-95 — 0-916 1-95 


























It will be seen that in all three cases we have ¢ + 2-0 over a relatively wide range of possible 
values of r,, and we can infer that the same approximation will hold for any A, 2:0 <A< 2-5, 
and some value of k lying between 3-2 and 9-0. Since no very high degree of numerical 
accuracy will be required in carrying out the computations by means of (5-4), and assuming 
as an approximation that we are dealing with a normal distribution, it is evident that a 
great deal of time will be saved at each step in the calculations by adopting values of A 
within this range, together with the appropriate value of k = b/d, and using the approxima- 
tion ¢ = 2 in the expression for the variance. 

Then, to summarize this model in which it is assumed that the birth-rate of each species 
remains constant, we calculate the expected numbers at time ¢+1 by means of 


Ae 
a(t) 


and assume that NV ,(¢+ 1) is distributed normally with variance 


ELN,(t+ 1)] = —~ N,(t) (a = 1,2) (5-6) 


var [N,(t+1)] = 2B[N,(t+1)], (5-7) 








providec 
three sy 


In ea 
are cho: 
to the e¢ 

This 
qa(t) wh 
tion th 
within 
chosen 
approx 
which : 
species 
N,() a 

N,(0) i 
shown 
less, tk 


In thi 
plicate 


then f 


where 
Becau 


since | 


Hence 


* 
plane, 


where 








P. H. LEsire 23 


provided we are able to choose values of A,, within the prescribed range; and where, for the 
three systems considered here, we have 


Single species (logistic): q(t) = 1+aN(t). 
Two competing species, S, and S,: q(t) = 1+a,N,(t)+y,N,2(6), 
qo(t) = 1+ a,N,(t) + 72M, (t). 





(5-8) 
Predator-prey (S, prey: S, predator): q,(t) = 1+a,N,()+y,N,(0), 
N,(t) 
t)=1l+a,——, 
Jol ) 2 N,(t) ] 


In each of these three cases the values of the constants «,, %, y, and 72, given A, and A,, 
are chosen so as to give some convenient stationary state of the particular system, according 
to the equations (3-3), (3-6) and (3-10). 

This approximation for var [N,,(¢+1)] will be found to hold over most of the ranges of 
Ya(t) which will occur in practice for one of these hypothetical populations. Thus the assump- 
tion that ¢ = 2 for r>r,> —r in (5-5) is equivalent to saying that for a given value of A, 
within the prescribed range, 1 < y,(¢) <A. For instance, in the case of the logistic, having 
chosen a value of « so as to give a stationary state K = (A—1)/«, this means that the 
approximation will hold for all values of N lying between zero and N = (A+1)K, a range 
which is ample for all practical purposes. It is more difficult in the cases of two competing 
species and of the predator-prey relationship to define these limits concisely in terms of 
N,(t) and N,(t); but, for either of these systems, given any reasonable values of N,(0) and 
N,(0) in relation to the assumed stationary state of the particular system, experience has 
shown that in the development of the resulting process, q,,(¢) remains less, and usually much 
less, than A? in each case.* 


6. MODELS IN WHICH THE DEATH-RATE REMAINS CONSTANT 


In this case the development of the model in numerical terms is somewhat more com- 


plicated. Given a value Ajeet aes, 


then for the interval of time ¢ to t+ 1, 
log, A, = ™, = 5,—d, 
where b, is now a function of N,(t) and N,(t), and d for the particular species remains constant. 


Because a negative birth-rate is meaningless, we therefore have to define 


‘ A=AlN (1<u<e’) 
since b, = 0 when q, = e?; and 
A=e* (q2¢’). 
Hence, corresponding to (5-4) and (5-5), we have 
E(N,.1) = 1,N, 
var (N,,,) = 9’ E(N,,1); 


* In the case of two competing species this approximation should hold at any point on the (Nj, N43) 
plane, which lies below and to the left of the boundaries formed by the intersecting straight lines 

; m=Ai, G2 =As, 
where q, and q, are defined in (5-8). 








24 Stochastic model for studying biological systems by numerical methods 
where from (2-4) 
$ = (1+2d/n)(A,-1) (+0, r>%> —d), 
= 2d (7 as 0), 
=1l-e4 (r, = —d). 


It is evident that in this case there will be no constant value of ¢’ for some combination 
of A and b/d = k. In any numerical experiment using this type of model, however, the 
value of ¢’ can always be tabulated for given A and d between the limits A>A,>e-4. As 
this model in which the death-rate remains constant is likely to be used as a contrast to 
that in which the birth-rate remains constant, it is of interest, therefore, to tabulate ¢’ for 
the same values of A and k as were illustrated in the previous section. Thus we have the 
following values of d for the given A and b/d = k. 











A k b d 
2-00 3-2 1-0083 0-3151 
2-25 5-0 1-0137 0-2028 
2-50 9-0 1-0308 0-1145 
l 

















Taking these values of A and d, we have the following tables of ¢’ = f(A,), ending in each case 
with A = e~4, 



































A=2-0, k= 3-2 | A= 2-25, k= 5-0 A=25, k= 9-0 
A ?’ A; ¢’ At ’ 

2-0 1-9092 2-25 18752 2-5 18749 
1-8 1-6577 2-05 16433 2-3 1-6574 
1-6 1-4045 1-85 1-4104 2-1 1-4395 
1-4 11492 1-65 11765 1-9 1-2211 
1-2 0-8913 1-45 0-9412 1-7 1-0021 
1-1 0-7612 1-25 0-7044 1-5 0-7824 
1-0 0-6302 1-05 0:4657 1:3 0-5619 
0-9 0-4981 1-00 0-4056 Ll 0-3403 
0-8 0-3648 0-95 0:3453 1-0 0-2290 
one ce 0-865 0-2243 0-9 0-1173 

| 7297 0-2703 0-8165 0-1835 0-8918 0-1082 





In all three cases ¢’ is very nearly a linear function of A,. In fact, working to two decimal 
places which should be sufficient for all ordinary purposes, a very good approximation to 
¢’ is given by the following straight lines: 

A=2:00: ’ =—0-66 + 1-29A,, 
A=2-25: ¢! =—0-77+1-18A, (6-1) 
A= 2:50: $’ = —0-87+1-10A,. 





Thu 
of A al 


and 


where 
sectio 
for th 

So1 
sectic 
be pu 


The « 
deter 
replic 
Para 
was | 
from 
It ws 
be d 
case 

as ju 
cleai 
of a 

this 

the : 
tion 
tion 
fact 
logi: 
rela 
mag 

I 


the 
(3-8 
nur 
bet 





al 





P. H. LEsiie 25 


Thus, in a numerical realization of this model, we might adopt one or other of these values 
of A and d, and calculate the expected numbers at time ¢+ 1 by means of 





ELN,(t+1)] = Ae Natt) = Aglt)N(t) (Ag>Ag(t) > €%) 
= e-% N,(t) (da(t) > ee) 
(a = 1,2), { (6-2) 
uF var [N,(t-+1)] =f{Ag(t)] ELN,(t+1)] — (Ag> Ag(t) > e-#e) 
= (1—e-%) EIN (¢+1)] (qalt) > €), 


where the q,(t) for the three types of system are the same as those given in the previous 
section (equations (5-8)), and the functions ¢’ = f[A,(¢)], in the expression for the variance, 
for three values of A are given by (6-1). 

Some numerical results obtained by using these two types of model are given in the next 


section for a logistic process, while the results for a system of two competing species will 
be published in a later paper. 


7. SOME NUMERICAL RESULTS FOR A LOGISTIC PROCESS 


The question of the relationship between the possible types of stochastic model and their 
deterministic equivalent arises in an analysis which has been given (Leslie, 1957) of some 
replicated experiments carried out by Gause (1934) with populations of the Protozoa, 
Paramecium aurelia and P. caudatum. The size of these populations, when each species 
was living alone, became fairly large, the number of individuals in each replicate increasing 
from 20 initially to around 2000-6000 when they were in the region of their stationary state. 
It was assumed in the analysis that the changes in the mean values of these processes could 
be described adequately in terms of a deterministic model, and it was shown that in the 
case of both species living alone a logistic equation gave a satisfactory fit to the mean values, 
as judged by the degree of variation observed between the replicates. But it was not at all 
clear at the time the analysis was made, what the relation would be between the parameters 
of a logistic fitted empirically in this way and the true parameters of the process, assuming 
this was a random logistic. Moreover, it was completely unknown whether the decrease in 
the relative rate of increase in numbers of these populations, as they approached the sta- 
tionary state, was due to a decrease in the rate of division of the individuals, i.e. to a reduc- 
tion in the birth-rate, or to an increase in the death-rate, or to some combination of these 
factors. A set of experiments, therefore, was carried out with the two extreme types of 
logistic model, in order to see how the mean values of these random processes behaved in 
relation to the deterministic model as the value of the upper asymptote K increased in 
magnitude. 
In the deterministic model AN, 
E(M41) = j + aN, (7-1) 





the value of A = 2-0 was adopted, and in turn a was taken as 0-01 and 0-0005, so that from 
(3-3) the upper asymptotes in numbers were K = 100 and 2000, respectively. The initial 
numbers for the two populations were N, = 15 and 300, giving the same relative difference 
between N, and K in each case. Each population was then assumed to be subject, first to 








26 Stochastic model for studying biological systems by numerical methods 


a constant birth-rate and variable death-rate (B.R.c. model), and secondly to a variable 
birth-rate and constant death-rate (D.R.c. model). There were ten replicates of each type of 
population, and starting at an origin of time, the processes were calculated up to ¢ = 10. 

Since the purpose of these experiments was to compare the types of model and the effect 
of increasing the size of the populations, the same set of random normal deviates was used 
throughout. Thus, a block of 10 x 10 deviates in units of o was taken quite arbitrarily from 
a convenient table given by Deming (1944, Appendix), and it was assumed that a particular 
replicate, no. 1 for instance, was subject to the same sequence of deviates in each set of 
experiments. Any differences between the results, therefore, can be attributed either to 
the size of the populations, or to the type of model which was used. 


Table 1. The mean values of the ten replicates 


Values of NV, with range in parenthesis. 














K = 100 K = 2000 
t 
B.R.C. D.R.C. B.R.C. D.R.C. 

0 15-0 15-0 300-0 300-0 

1 27-5 (15— 40) 27-2 (17— 38) 529-0 (475— 585) 528-2 (480— 578) 
2 45-9 (24— 177) 44-9 (27— 70) 849-9 (758— 985) 846-4 (766— 962) 
3 63-6 (34— 97) 62-3 (39— 88) 1200-2 (1061-1352) 1194-8 (1083-1323) 
4 69-8 (43-119) 71-4 (50-101) 1473-7 (1355-1698) 1477-8 (1379-1645) 
5 85-7 (49-108) 85-9 (59— 99) 1718-8 (1556-1832) 1713-7 (1596-1802) 
6 83-8 (53-113) 87-5 (65-106) 1815-2 (1679-1935) 1825-8 (1735-1901) 
85-3 (60-112) 90-3 (77-105) 1881-8 (1770-1996) 1896-7 (1836-1962) 
8 91-9 (64-126) 95-9 (80-113) 1948-3 (1824-2082) 1952-3 (1883-2029) 
9 86-8 (70— 98) 93-0 (83— 99) 1936-4 (1858-1993) 1952-6 (1912-1985) 
10 91-0 (67-120) 95-4 (83-113) 1961-2 (1870-2094) 1972-4 (1922-2048) 























B.R.C. = constant birth-rate model. D.R.c. = constant death-rate model. 


The calculations were carried out on an ordinary hand-machine, and no high degree of 
numerical accuracy was attempted. Thus, H(N,,,) was calculated to the nearest integer, 
from which o? and o was obtained. The product of o and the random normal deviate +A 
was taken to the nearest integer and added to, or subtracted from H(N,,,), according to the 
sign of the deviate which had been drawn. 

The results are presented ia Table 1, where the mean values of the ten replicates in each 
set are given up tot = 10, together with the observed range of the individual JN, in paren- 
theses. It will be seen that in the later stages of the growth in numbers the mean values of 
these processes are tending to settle down to a level which is less than the deterministic 
asymptote. The relative difference between these levels and the upper asymptote is greater 
when K = 100 than when K = 2000; and in each case the model with the death-rate con- 
stant approaches nearer to K than the constant birth-rate model. A difference in this 
direction between the mean values of a stochastic logistic process around the stationary state 
and the asymptote of the deterministic model is, however, to be expected theoretically 
(Feller, 1939). 





P. H. LEstie 27 


A logistic curve was then fitted to each series of mean numbers in Table 1, by the same 
method as that used in the original analysis of Gause’s data (Leslie, 1957, § 4). As a result 
the following estimates of the parameters A, a and K = (A—1)/a in (7-1) were obtained. 

















Type of model A a K 
B.R.C. 2-349 0-01534 87-9 
D.R.C. 2-177 0-01257 93-6 
(True values) (2-000) (0-01000) (100-0) 
B.R.C. 2-052 0-0005362 1962 
D.R.C. 2-039 0-0005256 1977 
(True values) (2-000) (0-0005000) (2000) 
ape 











It will be seen that in the first pair of experiments, when K = 100, the estimates of the 
parameters A and a differ quite appreciably from the true values. These differences, how- 
ever, are less in the case of the constant death-rate model, which is presumably due to the 
smaller variance of this model as N, approaches K. When K = 2000, the parameters of the 
empirical logistics approach much closer to the true values, and the differences between the 
two types of model are very much less. 

These logistics appear to give an excellent fit to the observed series of means in Table 1. 
For instance, to take as examples the two more extreme estimates of A and a in the table 
above, we have the following expected values compared with those observed, given the 
initial values of the processes in each case. 














B.R.C. model: B.R.C. model: 
A = 2-349, « = 0-01534, Ny = 15-0 A = 2-052, a = 0-0005362, N, = 300-0 
t 
Observed Expected Observed Expected 

1 27-5 28-6 529-0 530-3 
2 45-9 46-7 849-9 847-3 
3 63-6 63-9 1200-2 1195-5 
+ 69-8 75-8 1473-7 1494-9 
5 85-7 82-3 1718-8 1702-7 
6 83-8 85-4 1815-2 1826-4 
7 85-3 86-8 1881-8 1893-5 
8 91-9 87-5 1948-3 1928-0 
9 86-8 87:8 1936-4 1945-2 
10 91-0 87-9 1961-2 1953-8 























The main conclusions which one would come to as a result of these experiments are that 
the mean values of a random logistic can be fitted by an ordinary logistic curve, and that the 
estimates of the parameters for these empirical logistics gradually approach the true values 
of the process as numbers increase in magnitude. For large populations we might conclude 
that, to a fairly close approximation, the deterministic model is for all practical purposes 








ce | 


28 Stochastic model for studying biological systems by numerical methods 


the same as the mean of the stochastic model. Moreover, it appears that with populations 
of the size observed by Gause, the differences between the possible types of stochastic model 
in the case of a logistic process are not of any very great importance. 


8. THE CHANCE OF EXTINCTION IN A LOGISTIC POPULATION 


Recently, Bartlett (1957) has discussed the question of the chance of extinction in a 
random logistic process. He points out that although from the theory of finite Markov 
chains the ultimate state of such a system is N = 0, nevertheless, if we consider the variance 
of a small deviation from the upper asymptote (or, more strictly speaking, a quantity which 
he defines as y/«, and which is equal to twice the variance of the deviation), then provided 
this quantity is small, the chance of extinction may be neglected for any given time interval. 
He suggests that under such conditions the population will continue to show fluctuations 
with this variance. Since none of the processes calculated in the previous section showed any 
tendency to drift towards the absorbing barrier N = 0, and appeared in their later stages to 
be approaching some steady state, it is therefore of interest to relate the numerical results 
obtained by means of the models used here with the quantities expected from the more 
formal theoretical development given by Bartleti. 

Bartlett considers the asymptotic situation when N ~ K; or, more precisely, when there 
is a deviation u from the upper asymptote K, defined by 


u=(N-K)/K. (81) 
He then shows that in the stochastic model we have for small w 
var (u) = dy/a’, (8:2) 


where a prime has been attached to his symbol @ in order to distinguish it from the « used 
here in the case of a logistic process. If we write the birth-rate and death-rate functions, 
respectively, as (cf. equations (4-4)) 


Y(N)=b-a,N, g(N) =d+a,N, (8-3) 
with b—d = r, a,+a, =a, and K = r/a in the deterministic :iodel 
ao = (r—aN)N, 


then we have in (8-2), according to the definitions given by Bartlett 
y = (6+d)/K +a,—4,, 
a’ = b—d. 


Thus, if we take the extreme cases of either the birth-rate function or the death-rate func- 
tion remaining constant in the stochastic model (i.e. when in (8-3) a, = 0, and a, = 0, 
respectively), we should have for the 


B.R.cC. model, var(u) = Ko=a)'| 
8-4 
d (8-4) 
D.R.C. model, var(u) = K(b—d)’ 








— 
= 


gZa@oeaereaeeaso 


ons 
del 


na 
<OV 
nee 
ich 
led 
ral. 
ons 
ny 
3 to 
ts 


ere 


P. H. LEsiie 29 


It is not easy to say in the case of developing systems, such as those illustrated in the 
previous section, exactly from what point onwards in time these processes can be regarded 
as approximating to their steady state. But, supposing we assume that at the last three 
calculated ‘censuses’, at ¢ = 8, 9 and 10, these numerical realizations were on the average 
in the neighbourhood of this state. Then, from the figures for the ten individual replicates 
in each case, we can estimate by an ordinary analysis of variance the residual var (J), 
which in these examples will be based on 18d.f., after eliminating the sums of squares for 
“between times’ and ‘between replicates’. Since from (8-1) we have var (wu) = (1/K?) var (NV), 
we can thus obtain estimates of this variance from the observed numbers, which can be 
compared with those expected from (8-4), given the numerical values of the parameters 
adopted in these processes, viz. b = 1-0083, d = 0-3151 (A = 2-0). The results of the calcula- 
tions were as follows: 






































var (u) Ratio (B.R.C.)/(D.R.C.) 
} 
K Type of 
model 
Estimated Expected Estimated Expected 
100 0-01781 em, 0-01455 | 
B.R.C. -0178 . 
D.R.C. 0-005852 oon _— ssh 
2000 B.R.C. 0-0008951 0-0007273 3:17 3-20 
D.R.C. 0-0002825 0-0002273 
| 





It is evident that these variances are of much the same order as those expected. All four 
are somewhat greater than expectation; but these estimates of var (w) cannot be entirely 
independent, since the same block of random normal deviates was used in calculating each set 
of processes. If we were to test any single one by means of a x? test, the agreement with 
expectation would be regarded as satisfactory (and even the total x? is not excessive). An 
important point is that it follows from (8-4) that the ratio of var(w) for the two types of 
model should be equal to b/d, the ratio between the assumed values of the constant birth- 
rate and the constant death-rate. It will be seen that the values of this ratio determined 
from the estimated var (w) are very close to the true value of 3-20. Considering the approxi- 
mations which have been made, not only in carrying out the actual computations, but 
also in developing these types of stochastic model, particularly in regard to the estimates 
of var(N,,,) in §§5 and 6, this surprisingly good agreement with expectation is most 
encouraging. 

It has been stated that in the populations with the smaller numbers and greatest variance 
(B.R.c. model: K = 100) there was no evidence of any tendency for the number of individuals 
to approach zero, once the replicates had surmounted the early stages of their growth in 
numbers. The steady state for this stochastic model appeared to be NV ~ 90, and the standard 
deviation of random fluctuations about this state was estimated from the observed numbers 
to be s~,/178 = 13-3. That is to say, in terms of a normal distribution, we would have for 
any given time interval P(N <56)~5.10-°, P(N <46)~5.10-* and P(N=0)<5.10-™. 
Clearly, the chance that a replicate in the region of N = 90 will fall below N = 50 is small, 
while the chance of extinction can obviously be neglected for any given time interval. It 








30 Stochastic model for studying biological systems by numerical methods 


seems reasonable to conclude, therefore, that in this particular system the chance of a 
replicate becoming zero, once it is in the region of its stationary state, is negligibly small.* 
Thus, we might expect that under certain conditions, even quite small populations of a 
single species, of the order of 100 individuals, could continue to fluctuate about some steady 
state for comparatively lengthy periods of time without showing any tendency to become 
extinct. 

The most striking evidence that relatively small populations of a single species can persist 
almost indefinitely under stable environmental conditions is shown by the results of Park’s 
experiments with the flour beetles, T'riboliwm confusum and T’. castaneum (Park, 1954). 
Replicated populations of these two species were observed when each was living alone in 
a fixed volume of flour, which was renewed every 30 days, and under six different constant 
conditions as regards temperature and relative humidity. Taking the five varieties of 
physical conditions in these experiments under which both species could continue to flourish, 
it appears from the figures given by Park (1954; Table 3, Treatments I-V) that the mean 
total numbers (excluding eggs) in these populations, once they were in the region of their 
equilibrium states, ranged from about 80 to 400 individuals, depending on the species and 
treatment. Only in the one case of 7’. castaneum, living at 34°C., 30 % R.H., did a few of the 
replicates become extinct; but it is evident from the tables given that the mean total 
numbers for this species, under these conditions, oscillated in a fairly regular fashion around 
the overall average of about 80 individuals, falling as low as 35 on day 120 and 51 on day 
450 (Park, 1954; p. 188, and Appendix, Table 2). These relatively long-term oscillations 
about the equilibrium level may have been due, in part at least, to the changing age structure 
of these populations, and such oscillations are likely to increase the variance about this 
level, and hence to increase the likelihood of extinction. In all the remaining cases, however, 
the populations of both species persisted (in the absence of infection and inadvertence), 
some of the replicates for each of his physical treatments being observed for a total period 
of 1860 days, or just over 5 years. Prof. Park has very kindly sent me copies of some graphs 
and also the detailed data for all the individual replicates of both species observed at 29°C. 
and 70 % R.H. The mean total number of individuals in these populations when they were 
varying about their equilibrium states, was, approximately, 260 for 7’. confuswm and 400 
for 7’. castaneum. In neither case did there seem to be any tendency for the numbers in any 
of the replicates to fall so low that they might be in danger of becoming extinct. Now, it 
has been estimated that the mean length of a generation for 7’. castanewm, living under these 
physical conditions, is 55-6 days (Leslie & Park, 1949), so that a period of 1860 days for 
this species represents a total turn-over of about 33-34 generations. Taking the conventional 
mean length of a generation for man as being roughly 30 years, some of these populations 
were observed, therefore, over a period of time which would be equivalent to about 1000 
years in terms of human experience. As Park points out (1954, p. 188) in regard to these 
experimental populations: ‘...it seems reasonable to assume that they would persist 
indefinitely under obtaining procedures of husbandry unless new, deleterious influences, 
whether ecological or genetic, developed spontaneously or were introduced.’ 

* A referee has pointed out that this problem is essentially one of the passage time to the zero state, 
and that, broadly speaking, this time is on the average inversely proportional to the probability of that 
state if the series were truly stationary, i.e. if the zero state were non-absorbing (cf. Bartlett, 1956, 


§ 6-41). In the present example, the probability of the zero state is, in terms of a normal distribution, 


less than 5.10-!°, so that the mean passage time to extinction would be of the order of at least 2.10° 
units of time. 








P. H. LEsLre 31 


I am indebted to Mr D. G. Kendall for some most valuable criticisms and suggestions 
which he made after seeing an early draft of part of this paper. I should also like to thank 
Prof. T. Park for the generous way he has replied to my many inquiries by sending me the 
detailed data for some of his experiments, and Prof. M. S. Bartlett for allowing me to see 
the proofs of his recent paper before publication. 


REFERENCES 


BartuETT, M. S. (1956). An Introduction to Stochastic Processes. Cambridge University Press. 

BaRrtTLett, M. S. (1957). On theoretical models for competitive and predatory biological systems. 
Biometrika, 44, 27-42. 

Carn Lone Curane (1954). Competition and other interactions between species. Article in Kemp- 
thorne, O. et al. (editors). Statistics and Mathematics in Biology, pp. 197-215. Iowa State College 
Press. 

Demina, W. E. (1944). Statistical Adjustment of Data. New York: Wiley and Sons. 

FELLER, W. (1939). Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahr- 
scheinlichkeitstheoretischer Behandlung. Acta biotheor., Leiden, 5, 11-40. 

Gavussz, G. F. (1934). The Struggle for Existence. Baltimore: Williams and Wilkins. 

KENDALL, D. G. (1949). Stochastic processes and population growth. J.R. Statist. Soc. B, 11, 230-64. 

Leste, P. H. (1948). Some further notes on the use of matrices in population mathematics. Bio- 
metrika, 35, 213-45. 

Leste, P. H. (1957). An analysis of the data for some experiments caried out by Gause with popula- 
tions of the Protozoa, Paramecium aurelia and Paramecium caudatum, Biometrika, 44, 314-27. 

Lesuiz, P. H. & Park, T. (1949). The intrinsic rate of natural increase of Tribolium castaneum 
Herbst. Ecology, 30, 469-77. 

Park, T. (1954). Experimental studies of interspecies competition. II. Temperature, humidity, and 
competition in two species of Tribolium. Physiol. Zool. 27, 177-238. 








[ 32 ] 


ON THE DERIVATION AND APPLICABILITY 
OF NEYMAN’S TYPE A DISTRIBUTION 


By J. G. SKELLAM 


The Nature Conservancy, London 


1. Using a model suggested by ecological processes, Neyman (1939) derived a class of 
‘contagious’ distributions of great interest to biologists. The two-parameter type A dis- 
tribution in particular has found many useful applications, but, in view of the ecologist’s 
prime interest in elucidating the processes of nature, it is important that no misu:der- 
standing exists as to the kind of ecological or spatial picture which is implied not only in 
the original derivation but by subsequent workers (Thompson, 1954). 


2. It was supposed that a number of ‘centres’ were distributed at random in a large field 
F, that each centre gave rise to a number of offspring n (the actual number being a random 
variate with probability function p(n)), and that the offspring from any one centre were 
distributed in the space around that centre independently of one another. The actual law 
(f) governing the distribution of offspring about their centre of origin was not explicitly 
stated, though it was clear that it had the same basic character whatever the position of the 
particular centre of origin. The primary aim was to deduce the probability distribution of 
X, the total number of offspring occurring in a randomly chosen plot or quadrat (here 
called Q), taken as being unit area. 

Neyman denoted the probability that an offspring arising from a centre (£, 7) falls into 
Q as 


P(é,») = | | fe-&u-ndedy, (1) 


where the integration extends over all points (a, y) in Q. 

By regarding f(x —£,y—y7) as zero for points (x, y) sufficiently removed from (&, 7), those 
centres capable of contributing offspring to Q were restricted to a region (here called 7) 
of area A. 

3. After obtaining the general result in terms of the unspecified functions p and f, the 
nature of which may differ in different ecological situations, Neyman considers several 
particular cases. The simplest procedure, leading to the so-called type A distribution, was 
first to regard p(n) as the Poisson function and secondly to set 


PE ={ for (&,7) in <A, (2) 
hati 0 for (&,7) outside 07. 


This representation of P(£,7) by a step function was apparently made as a convenient 
approximation, and it is the purpose of this note to examine the full implications of this 
assumption. To do this, equation (2) needs to be taken in the first instance in the strict 
sense in which it is stated, and afterwards we consider what relaxation is possible. 


4. Equations (1) and (2) constitute in effect an implicit definition of the unspecified 
function f by means of an integral equation, and it will be seen in the following argument 
that in order to obtain a solution it is both necessary and sufficient that A should be an 
integer, and that the distribution of offspring in space should consist of A equally sized 





the 


se 
fr 





J. G. SKELLAM 33 


probability masses so spaced that no two are enclosed at the same time by any figure equal 
(in shape and area) to Q and with the same orientation. 

The nature of the general result is connected with the self-evident fact that, if F;(w, v) 
(j = 1,2,...,) are a set of cumulative frequency functions such that 
1 for (&,7) in &%,, 


ff dF,(x—£,y—y)= +44 for (&,7) on boundary of <,, (3) 
“ 0 for (&,7) outside ~,, 


then F(u,») == > E(u,») 
j=1 


I/n for (&,) in XA,, 
satisfies If dF(x—£,y—7) =41/2n for (&,7) on boundary of 2%, 
’ 0 for (&,7) outside X.~7,, 
provided that the ./, do not overlap, though their boundaries may coincide. 
5. Without loss of generality, the problem is most easily considered in its one-dimensional 


form, viz. 1 (A-1 for & in (a, /), 
[_aP@-8 = P® = (4) 
2=0 0 for £& outside (a, /). 
Clearly F(1—£)—F(—§£) = P(§), (5) 


where the cumulative function F(u) is by necessity a non-decreasing function of w. 
Setting £ in succession equal to «—d,a—d—1,«—d—2, ... in (5), where dis an arbitrarily 
small positive quantity, we obtain 


F(—a+6) = F(l—a+5) =... = F(o), 
so that F(u)=1 for allu>-—za. (6) 
Similarly, F(u)=0 for allw<1-—A/. (7) 


The simplest particular solution arises when the points —a and 1—f coincide, that is 
when £—a = 1, and the whole probability mass therefore is located at this point. Then by 
setting £ = a+} = f—}, we obtain from (4) and (5), F(}—a)—F(}—/) = A, whence 
from (6) and (7), A = 1. 

Consider now the possibility that the length of the interval («, #) is not exactly an integer. 
Then S—a = 1+q = 1—1++1, where J is the integral part of B—a, 0<q<land }<}<\1. 
We can now choose sets of points £; spaced one unit apart in («, #) in two main ways: 


(i) a+4q, l+a+dg, .., Itatiqg (=f-49), 
(ii) a+4r, lt+atd, .... [-l+a+dr (= B—3r). 


If the two sets of values of £ are substituted in (5) and the results for each set added, we obtain 


in the two cases 
F(1—a—49q)—F(—£ +39) = (+1) A>, (8) 


F(l1—a—4r)—F(—£+41) = IA. (9) 
From (6) and (7) it follows that we are faced with the contradiction J + 1 = I, except for the 
possible limiting case where A and J tend to infinity together. 


3 Biom. 45 








34 Applicability of Neyman’s type A distribution 


There is no such contradiction, however, when the length of the interval («, 2) is exactly 
an integer provided that P(£) = 44-1 at the end-points a and f. It then follows that 
A=I]=f-«a, and the cumulative function F(u) is a step function with J equal steps, 
spaced in succession at unit distance apart. 


6. The case just considered may be regarded as one where the interval (a, /) can be 
resolved into J contiguous pieces of unit length each, and the method of argument is readily 
extended to the more general situation where instead of a single interval there are several 
non-overlapping intervals, the lengths of which are integral multiples of the basic unit (Q). 
The nature of the two-dimensional result then becomes apparent. 


Me 
M;®@ 


M2®@ 





Fig. 1 illustrates a typical solution of the two-dimensional integral equation where Q 
for the purpose of a diagram is arbitrarily shown as a lozenge-shaped region. The manner of 
distribution of the offspring about their centre O is represented by three distinct and equal 
probability masses M,, M,, M,. The region &.9/; is shaded. If O lies in .o/;, then M; and only 
M, lies in Q, the shape and pattern of the X.°%; being determined by the shape of Q and the 
pattern of the M’s about O. The latter pattern may be chosen arbitrarily subject to the 
condition that under translation the M’s mutually exclude one another from Q. 


7. It is apparent that the exact solution to this problem imposes severe restrictions on 
the ecological picture to which the mathematical model (considered in the strict sense) is 
applicable. Even so there are biological situations which conform to it. 

In the simplest case each centre gives rise to a single compact cluster of offspring located 
at an arbitrary finite distance from the centre of origin. Though P(£, 7) was assumed to be 
the same for all centres, there are indeed infinitely many solutions dependent on this 
arbitrary quantity, which could, if necessary, be assigned differently to the different centres, 
provided, of course, that the displacement is finite and allotted independently of (£,7) 
[the position relative to Q in the virtually infinite field F of the randomly located centre]. 
Since the offspring form a single compact cluster, it does not appear necessary in the case of 
the two-parameter type A distribution to invoke Neyman’s original assumption that the 
offspring have no ‘social instincts’ and are distributed about their centre independently of 
one another. For example, a group of lepidopterous larvae arising from an egg batch could 
persist as a gregarious band. 

In the more general case, each centre gives rise to a ‘pattern’ of compact clusters suffi- 
ciently well spaced from one another for it to be impossible for a quadrat of the size selected 
to include more than one cluster from the same centre. Such might be the case, for example, 





aie tot at et te 





J. G. SKELLAM 35 


where a plant reproducing vegetatively sends out rhizomes, from which at well spaced 
intervals (e.g. nodes) a variable number of aerial shoots arise close together. The patterns 
of clusters associated with the different centres need not be identical, but it is easy to see 
that, if the number of clusters per centre is not fixed, it becomes necessary for cluster sizes 
belonging to different centres to be independent random values from the same popula- 
tion p(n). 

Neyman’s model for the two-parameter type A distribution, considered in the strict 
sense, can therefore be reduced to a comparatively simple situation, which readily lends 
itself to treatment by generating functions, as for example in Feller (1943) or Skellam (1952). 
For if G(z) is the p.g.f. of the number of centres which contribute a cluster to Q, and if g(z) 
is the p.g.f. of the distribution p(n), then the p.g.f. of the distribution X, the number of 
offspring in Q, is given by Watson’s theorem as G(g(z)). 


8. because of the very special kind of distribution of progeny required to satisfy equations 
(1) and (2), the question arises as to how it can happen that the type A distribution fits 
such a wide range of ecological data so well. The reason for this circumstance seems to be 
that, whilst the dispersal of the progeny about their birth place is continuous and the 
probability distribution of their locations at the time of the census is often likely to be 
represented by a continuous surface with a maximum in the neighbourhood of (£, 7), the 
value of the integral (1) considered as a function of (€,7) may be quite close to the plateau 
defined by (2). The approximation will be particularly good wherever the dimensions of Q 
are substantially larger than the greatest distance the progeny are likely to travel between 
the time of birth and the census. In this case, whenever (£, 7) is well inside Q, the probability 
P(é, 7) will differ from unity by only a small quantity; for (£, 7) well outside Q, the value of 
P(&, 7) will be virtually zero; whilst, for (£, 7) on or near the boundary, P(é, 7) will be roughly 
}, the last class of cases being considerably less frequent and less important than the first. 

The above discussion suggests that, in the type of ecological situation which Neyman 
originally had in mind, the type A distribution has a better chance of fitting empirical 
distributions obtained by counting in large rather than small quadrats. 


9. Even so, the possibility that approximate solutions to the problem exist in opposite 
circumstances when Q is small and .% very large is suggested by the limiting consideration 
mentioned at the end of § 5. 

Consider first of all a number of exact solutions to (1) and (2), represented diagram- 
matically in Fig. 2, where Q is represented as a single square, O0,, O,, O; are randomly placed 
centres, and associated with each is a set of minute clusters, all with the same probability 
mass. The spatial extent of each set of clusters is shown by contours enclosing regions 
R,, Ry, .... There is here no need to show .7,, 75, ..., though it may be remarked that 7; and 
R; are alike in shape and area though not in position and orientation. 

If the populations of the individual small clusters within R; are Poisson variates all with 
parameter A, the expected number of offspring in R,; is AA;, so that even if the offspring were 
not in dense clusters but distributed at random throughout R; the number contributing 
to any unit square in R; would still be a Poisson variate with mean A. It may be remarked 
here that, since A is large, Neyman’s assumption on the independent dispersal of offspring 
from their centre implies that the number of offspring per unit square is a Poisson variate. 

The limiting case (with the implication that p(n) is Poisson) is then virtually equivalent 
to one where the offspring from a centre O; are spread out independently and at random with 

32 








36 Applicability of Neyman’s type A distribution 


uniform probability density throughout some extensive region R;. If the centres O, are 
numerous and at random, the number of centres contributing offspring to Q will be a 
Poisson variate, and, if the probability densities in the regions R; are the same for all j, 
their contributions to Q will all be Poisson variates with a common distribution. The 
Neyman type A distribution based on two parameters results immediately by the com- 
pounding of these two Poisson distributions as indicated at the end of § 7. 


























a 





ae cd 


Fig. 2 


CoNCLUSION 


It would appear that the sampling situation to which the Neyman type A is particularly 
suited is one where the organisms occur in compact clusters, an observation which may be 
equally well applied to a number of other compound distributions with p.g.f.’s of the form 
exp {A(g(z) — 1)}. The compactness of the clusters is in fact a condition implied as a hidden 
assumption in Neyman’s original method of derivation. Nevertheless, the type A distribu- 
tion exhibits considerable robustness, and can be employed as an approximation in certain 
circumstances where the condition requiring compact clustering can be greatly relaxed. 


I am particularly grateful to Prof. Jerzy Neyman for his interest in this problem, his 
scrutiny of the argument and for his valuable suggestions on its presentation. 


REFERENCES 


Freier, W. (1943). Ann. Math. Statist. 14, 389-400. 
Neyman, J. (1939). Ann. Math. Statist. 10, 35-57. 
SKELLAM, J. G. (1952). Biometrika, 39, 346-62. 
TxHompson, H. R. (1954). Biometrika, 41, 268-71. 








[ 37 ] 


NEGATIVE BINOMIAL DISTRIBUTIONS WITH A COMMON k 
By C. I. BLISS} anp A. R. G. OWENY 


1. IntTRODUCTORY 


Many biological experiments are evaluated by counts of the number of individuals per unit 
of space or of events per unit of time. When the individuals or events are randomly dispersed, 
the variance of the count, within sampling fluctuations, is equal to the mean, but in a 
natural population, the observed variance often exceeds its mean significantly. Of the 
several distributions by which this ‘overdispersion’ may be described, the negative bi- 
nomial has a number of advantages. It is defined by the arithmetic mean m and a parameter 
k measuring dispersion. It can be derived from a variety of initial assumptions. Its wide 
applicability has been demonstrated empirically. Its statistics are well known (Anscombe, 
1950; Bliss & Fisher, 1953; D. A. Evans, 1953). 

Observed counts are often compared with each other in terms of their means. These 
comparisons are more direct and unequivocal if the respective distributions have the same 
relative dispersion in terms of k. Thus, in devising sequential sampling schemes for tape- 
worm cysts in white fish (Oakland, 1950) or for spruce budworm on balsam fir (Morris, 1954; 
Waters, 1955), a common k is an essential part of the underlying model. In studying en- 
vironmental factors modifying trawl catches of haddock (Taylor, 1953) or the survival of 
hemlock seedlings (Olson, 1954), a stable k would simplify the comparisons materially. 
Tests of insecticides and of fungicides may be judged from the number of surviving insects 
or damaged parts of plants. When treatments are applied in randomized blocks or other 
experimental designs, and their evaluation is based upon a count in each plot, counts 
transformed with an estimated common k will give the most informative analysis of vari- 
ance (Beall, 1942; Anscombe, 1949). In these and other cases, the negative binomial samples 
with a single k may vary markedly in their means. 

Several estimates of a common k have been described. Beall (1942) proposed an un- 
weighted moment estimate from the counts on duplicate plots in each block, a design 
which would double the size of each block and thereby increase the experimental error. 
When the standard deviation is linearly related to the mean of each set of counts, Klecz- 
kowski (1949) has shown that the intercept of an unweighted estimate of this regression 
provides an empirical constant similar to a common k, with which the variance car. be 
stabilized. Since the information in each sample is a function of its mean, Anscombe (1949, 
1950) has derived weights for computing a common k, suited to different methods for 
estimating k. Bliss & Fisher (1953) have described a maximum likelihood estimate which 
is efficient for all values of m and k but is hardly practicable when individual counts exceed 
20 or 30. 

The present paper concerns two approaches to the problem of estimating a common k, 
both through successive approximations. The first is an extension of Anscombe’s weighted 
moment estimate in terms of regression and small series. It is adapted for field studies and 
surveys where subsampling is customary. The second follows from the need for a common 


+ The Connecticut Agricultural Experiment Station and Yale University. 
{ Department of Genetics, Cambridge University. 








38 Negative binomial distributions with a common k 


k when transforming negative binomial counts preparatory to an analysis of variance, 
Subsampling is secondary and commonly omitted. If the estimated k is valid, the trans- 
formed units will conform to the underlying assumption of additivity. We will reverse this 
operation and estimate k as that value giving zero non-additivity by the Tukey (1949) test. 
The resulting & fulfils a specific purpose and its limitations as an estimate are here con- 
sidered secondary. 


2. THE MOMENT ESTIMATION OF A COMMON k 


(2-1). Derivation of a regression method. The moment estimate of k for a single negative 
binomial distribution has long been computed as 


k, = w/(s?—%), (1) 
where u is one of N individual counts and % and s? are their observed mean and variance, 
respectively (Fisher, 1941; Anscombe, 1950). Alternatively, we start with the two statistics 

xv’ =U-—s?/N and y’ =s*—-u. (2) 
Their expectations are given exactly by 
E(x’) =m?*, Hy’) = m*/k. (3) 
Thus, y’ —2’/k has zero expectation. 


For a single sample, we have the ratio y’/x’ as an estimate of 1/k. The efficiency of this 
estimate is the same as the efficiency of k, above, whose large sample variance (Anscombe, 


ade 2k(k +1) it 
N ms , 





(4) 
correct to order 1/N. The large sample variance of y’/x’, therefore, is 
2k(k + 1) meee 
Nks m }° 


The use of 1/k as the parameter for estimation (with estimate y’/2’ in the case of a single 
sample) has several advantages. As noted by Anscombe (1950), its bias is small, being in 
fact of order 1/N?. The statistic k, or its modified form 2’/y’ (which has the same efficiency 
as k,), however, has a positive bias which may be seriously large, namely 


2(k +1) 


V(y'/2’) = 





(5) 





(correct to terms in 1/N). Thus, when considering the working of examples, neither k, nor 
x’/y’ are easily comparable with the maximum likelihood estimate k because of the large 
bias of k, and the unknown bias of k. 

When a common value for k is to be estimated from a set of parallel samples of size NV, 
drawn from populations with different means m,, the idea of estimating 1/k from the ratio 
y’/x’ suggests that a very natural and manageable process would be to estimate 1/k as the 
slope of a regression of y’ upon x’. The argument may be put in the following form. 

The expectation of y’ —2’/k is exactly zero and the variance of this expression is given 
to order 1/N? by 2m*(m + k)? %#-1 3 

= “Wanye Oty ay: “) 





The i 





C. I. Briss anp A. R. G. OWEN 39 


The invariance w = 1/V is of the nature of a weight, which may be written as 
bi 0-5(N — 1) k4 1 (8) 
~ k(k+1)—(2k—1)/N —3/N? m2(m +k)?” 


Ww 





If calculated by replacing m by its efficient estimate w, the expectation m? by x’, and k by 
an empirical trial value k’, we can consider the weighted sum of squares 


U{w(y’ — 2'/k’)?} (9) 
as approximately a x”. This x? is minimized by choosing 1/k’ to satisfy the equation 
Dive’ (y’ — a! |k')} = 0, 
that is, by 1/k, = X(wa'y’)/X(wa"?). (10) 


This method clearly obtains an estimate of 1/k as the slope of a linear regression of y’ on 
x’, the regression line being constrained to pass through the origin (x = 0, y’ = 0). An 
initial plotting of the regression suggests a possible trial value of 1/k, determined as the 
slope of a line through the origin fitted by some simple procedure. Further, it enables a 
preliminary assessment of the heterogeneity of the various samples in respect to k. 

When successive approximations have led to an estimated 1/k, which differs negligibly 
from its last trial value, the calculation provides a x? test for the homogeneity of k in the 
different samples. Since the theoretical regression line is constrained to pass through the 
origin, the observed value of 


X? = X(wy"?) — L?(wa'y’)/X (wa), (11) 


may be compared with the tabular x? distribution. (Here and elsewhere &?(—) = [2(—)]?.) 

(2-2). Trial estimates of k,. The initial trial estimate of a common k may be obtained 
graphically or as a simple ratio. The statistics x’ and y’ for each of the g distributions in a 
series may be computed from %, s? and N by equation (2), or directly from the totals Zu (= U) 
and Xu? of N unit counts, as 


,  U*%- Xu? ,__NXw—U?-(N—-1)U 


“9-4 9 ae 


Occasional values of y’ which may be near zero or negative are included, of course, with 
the others, each with its proper sign. 

The diagram is fitted provisionally by a straight line passing through the origin with 
a slope selected graphically or approximated by 


b= 1/k’ = Xy'/=2’. (13) 


When %, or wz’ in the later calculation, does not differ markedly among series, this un- 
weighted estimate is often a good approximation to 1/k,, computed from equation (10). 
For a graphic test of apparent non-linearity in the regression, the ratio y’/x’ for each sample 
may be plotted against its mean w. The plotted points should agree substantially with a 
horizontal line if the entire series of distributions can be described with a single k,. This 
diagram also aids in spotting gross outliers. Since the ratios y’/x’ are needed later in the 
analysis, enough places should be recorded to avoid subsequent rounding errors. The sum 
of these ratios, g in number, provides still another estimate of the provisional k’ 


ki = g/X(y'|x’) (14) 





(12) 


of value when @% varies excessively. 





40 Negative binomial distributions with a common k 


The first provisional estimate is illustrated by distributions from ‘plots’ within subareas, to x’,y 
or ‘blocks’. These have been superimposed upon Beall’s (1939) counts of adult Colorado fitted | 
potato beetles (Leptinotarsa decemlineata Say) in each two-foot unit of row in an untreated, trend i 
heavily infested field of potatoes. Each of sixteen subareas or blocks was divided into eight Dist 
plots, each two rows wide and 10ft. long and separated by single guard rows as would be on Gec 
customary in a field experiment. With w equal to the total count on the ten units in each three « 


plot, u and s* (‘Table 1) were computed for each subarea from its eight plots (VN = 8), leading 


Table 1. Estimation of k, in eight plots (N = 8) within sixteen blocks, from counts (u) of 
Leptinotarsa decemlineata in a uniformly treated potato field (Beall, 1939); x’ and y/ 
computed by equation (2) 





Block U # fl y’ y’ |’ (u+k’)? wa’ wy’ 








1 75°75 539-07 5,670-7 463-32 0-08170 | %7,972-7 0-07623 0-006228 
2 84-00 627-14 6,977-6 543-14 0-07784 9,514-1 0-06388 0-004972 
3 42-25 257-93 1,752-8 215-68 0-12305 3,112-5 0-19526 0-024027 
4 32-50 77-43 1,046-6 | 44-93 0-04293 2,119-7 0-28671 0-012308 



































5 48-00 172-57 2,282-4 124-57 0-05458 3,787-2 0-16047 0-008758 
6 48-62 91-12 2,352-5 42-50 0-01807 2,863-9 0-15729 0-002842 
7 66-50 737-43 4,330-1 670-93 0-15495 6,406-4 0-09486 0-014699 
8 56-75 71-64 3,211-6 14-89 0-00464 4,940-7 0-12301 0-000571 
9 64-88 101-27 4,196-8 36-39 0-00867 6,149-7 0-09882 0-000857 
10 52-12 417-55 2,664-3 365-43 0-13716 4,311-2 0-14097 0-019335 
il 40-12 137-55 1,592-4 97-43 | 0-06118 2,879-4 0-21106 0-012913 
12 26-62 48-55 702-6 21-93 | 0-03121 1,612-8 0-37682 0-011761 
13 34-12 252-41 1,132-6 218-29 0-19273 2,271-5 0-26755 0-051565 
14 34-12 14-70 1,162-3 —19-42 |—0-01671 2,271°5 | 026755 |—0-004471 
15 13-88 45-84 | 186-9 31:96 | 0-17100 751-9 | 0-80827 0-138214 
16 37-62 167-70 | 1,394-3 | 130-08 | 0-09329 2,617°3 | 0-23220 0-021662 - 

| ig. 

Tarr . =I = yaar e 
| Total | 757-85 | 3,759-90 | 40,6565 | 3,002-05 1-23629 | — | 356095 | 0-326241 

| 


| 

k’ = 40,656/3002-05 = 13-54 (eqn. 13). wa’ = 607-74/(u+k’)? (eqn. 16). 

L(wa’?) = 6542-405, X(wa'y’) = 410-3434, k, = 13-507 (eqn. 17). 

X(wy’?) = 49-5943, BS = 30-3806 (eqn. 21). x? = 19-214 (eqn. 11), n = 15. 

Xw = 0-0063082, C = 16-8722 (eqn. 23). 

[wa’?] = 3532-265, [wa’y’] = 226-1819, [wy’*] = 32-7221 (eqn. 22). B* = 14-4831 (eqn. 24). 

















Effect of ; D.F. M.S. F 

SSS Sea SSS: Se r a 
Slope, 1/k, 1 30-3806 21-65 
Computed intercept against 0 1 0-9747 0-69 

Error 13 1-4030 — 




















1/k, = 0-0740 + 0-0134, ¢ = 1-960 at P = 0-05. 
Confidence limits: for 1/k,, 0-10036 and 0-04771; for k,, 9-964 and 20-960. 

















C. I. Buiss anp A. R. G. OWEN 


41 


to x’, y’, and y’/z’ in the next three columns. In Fig. 1, y’ has been plotted against x’ and 
fitted provisionally with 1/k’ = 3002-05/40,656-5 = 0-0738 (equation 13). No untoward 
trend is evident, either in Fig. 1, or in the plot of y’/x’ against w in Fig. 2. 

Distributions with a variable N are represented in Table 2 by the trawl catches of haddock 
on Georges Bank over three summers (Taylor, 1953). The number of tows (NV) at each of 
three depths in six subareas varied more than tenfold. Three series have been omitted, 


700 





600}— 


500 


” 300 


200 


100 


| | | | | 








| 








0 1000 2000 4000 5000 


, 
x 


6000 


7000 


Fig. 1. Regression estimate of k, for the distribution of the beetle Leptinotarsa in eight plots within 
each of sixteen blocks (‘Table 1). The slope of the weighted regression is 1/k, = 0-0740 + 0-0134. 

















0:3 
0-2-— ° 
> ¢ ° 
1/k, ° . 
0-1 = re) 
1/k, 5 © 
° ° 
. ° 
o— ft ° 2 
0-1 | | | | | | | | 
0 10 20 30 40 50 60 70 80 


Fig. 2. Relation in each block of Table 1 of the observed 1/k, = y’/x’ to 


the mean number of beetles (%) per plot. 


90 








42 Negative binomial distributions with a common k 


two which gave no information regarding k (x’ = y’ = 0), and one with a single very large 
catch for which y’/z’ = 6450. In the remaining 15 series the number of tows varied from 
4 to 47, a plot of y'/x’ against @ revealing no marked trend either up or down. 


Table 2. Estimation of k, from trawl catches of haddock (Taylor, 1953), 
N variable ; x’ and y’ computed by equation (12) 















































Subarea 
depth | N | U= <u Du? N(N-1) a’ y’ a y’ |x’ | 
| 
| 

GI 10 556 58,760 90 2,782-0 3,038-4 55-60 1-0922 

II 10 193 6,017 90 347-0 235-4 19-30 0-6784 
III | 42 1,159 88,997 | 1,722 728-4 1,363-0 27-60 1-871 | 
HI 15 1,158 527,430 210 3,874-0 31,210-8 77-20 8-0565 | 
II 13 693 169,117 156 1,994-4 10,961-3 53-31 5-4960 | 
III | 29 414 17,542 812 189-5 401-1 14-28 21166 | 
JI 4 247 30,017 12 2,582-7 4,859-8 61-75 1-8817 | 
II 26 5,987 4,477,491 650 | 48,256-4 | 123,724-5 | 230-27 2-5639 | 
III | 19 345 9,529 342 320-2 163-2 18-16 0-5097 | 
MI 18 833 186,797 306 1,657-2 8,674-2 46-28 52343 | 

U 41 | 11,808 | 28,736,600 | 1,640 | 67,495-3 | 633,109-4 | 288-00 9-3801 

tr | il 45 575 110 13-18 35-0 4-09 2-6555 
NI 16 1,395 1,017,147 240 3,870-3 59,614-2 87-19 | 15-4030 | 
II 38 2,603 685,893 | 1,406 4,331-2 13,650-1 68-50 31516 | 
| 

O II 47 775 65,833 | 2,162 247-4 1,136-9 16-49 45954 
agg oe Fes ae q eek | 
foul |—| — — — | 138,689-2 | 892,177-3 — 64-6861 | 

| 
0-001399(N — 1 
k’ = 15/(64-6861) = 0-23 (eqn. 14). A= ( ) (eqn. 15). 


0-2829 + 0-54/N —3/N? 
X(wa’?) = 1-2831, X(wa’y’) = 5-3178, k, = 0-2413 (eqn. 17). 
X(wy’?) = 35-264, BS = 22-040 (eqn. 21), y? = 13-224, n = 14. 
1/k, = 4145 + 0-8830, ¢ = 1-960 at P = 0-05. 
Confidence limits: for 1/k,, 5-8759 and 2-4143; for k,, 0-1702 and 0-4142. 


In our third example k changes progressively. A count of the number of wireworms in 
175 sampling units of soil in each of twenty-four fields gave the means (z) and variances (s*) 
in Table 3 (Jones, 1937). Although the relation between y’ and 2’ (Fig. 3) seemed consistent 
with a common k, y’/x’ decreased as % increased (Fig. 4). The applicability of a single common 
k to this series might well be questioned. 

(2-3). The weighted estimate of k,. Since the component distributions do not contribute 
equally to the estimation of k, as is evident from the wider scatter of y’ at the larger values 
of x’ in Fig. 1, each is weighted inversely as its variance. Instead of computing each weight 
separately by equation (8) we may determine the product wa’ directly from (w+ k’)? and 
the initial term in the weight 

0-5(N —1)k’4 


4 = Fe s1)— QF - ND 3/N* (15) 











rge 
om 


| 
| 
| 





C. I. Buiss anp A. R. G. OwENn 43 


If N is the same for each distribution, A is a constant; if N varies, k’ or N may be large 
enough to warrant omitting the term 3/N?. In any case, wa’ for each distribution is obtained 


Summing the products of wax’ by x’ and of wz’ by y’ leads to the estimate 
k, = X(wa'?)/X(wa'y’). (17) 


Table 3. Analysis of counts of wireworms (Limonius) in soil from N = 175 square-foot 
units in each of twenty-four irrigated fields in southern Washington (Jones, 1937) 














Field no. a s? a’ y’ y’ |x’ 
2 0-389 | 0-584 0-148 0-195 1-318 
3 0-714 1-245 0-503 0-531 1-056 
4 0-829 1-470 0-678 0-642 0-947 
5 0-869 1-322 0-747 0-453 0-606 
6 0-846 1-620 0-706 0-774 1-096 
7 1-246 2-939 1-535 1-694 1-104 
8 1-623 3-506 2-614 1-884 0-721 
9 2-949 6-653 8-656 3-704 0-428 

10 3-023 9-316 9-085 6-293 0-693 
ll 3°154 9-166 9-897 6-011 0-607 
12 3-154 9-637 9-895 6-483 0-655 
13 3°857 14-198 14-80 10-341 0-699 
14 3-966 14-28 15-65 10-315 0-659 
15 3-537 8-45 12-46 4-914 0-394 
16 4166 5:33 17-32 1-169 0-068 
17 4-509 11-35 20-26 6-846 0-338 
18 6-046 22-96 36°42 16-92 0-465 
19 6-366 26-18 40-37 19-82 0-491 
20 7-029 34-78 49-20 27-75 0-564 
21 7°794 32°81 60-56 25-02 0-413 
22 8-434 27-09 70-98 18-66 0-263 
23 8-160 22-26 66°46 14-10 0-212 
24 8-806 47-38 77°27 38-57 0-499 
25 10-886 60-42 118-15 49-54 0-419 























1/k’ = 0-4231 (eqn. 13). 1/k’ = 0-6131 (eqn. 14), provisional k’ = 1-93. 

Second approximation: X(wa’?) = 2326-28, X(wx’y’) = 1118-63, k, = 2-080, X(wy"*) = 637-871, 
Lw = 474-25, y? = 99-96, n = 22 (eqn. 11). 

Unweighted regression of y’/x’ on logu: b = — 0-626 + 0-091. 











Effect of D.F. 8.8. M.S. F 
Slope, 1/k, 1 537-91 537-91 156-41 
Computed intercept against 0 1 27-75 27-75 8-07 
Error 21 72-21 3-439 — 



























































44 Negative binomial distributions with a common k 
If k, should differ appreciably from its trial value k’, the columns of (w+ k’)? and of wz’ are estima 
redetermined, with the initial k’ in equation (16) replaced by k,. This may have a relatively equati 
small effect upon the estimate of k, but is essential for a valid test of homogeneity. in Tak 
These steps are illustrated in Table 1 for N constant and in Table 2 for N variable. When (k’ = ¢ 
the entries of wa’ are relatively stable with few varying more than fivefold, the computed If w 
- wirew 
by eq 
estimé 
50 from t 
(2-4 
but sc 
40 of sur 
Thus, 
y numb 
30 analy: 
the cz 
- The 
the to 
inter\ 
10 N23 
samp 
poole 
0 a] | | | | | 
0 20 40 60 80 100 120 
x 
Fig. 3. Regression estimate of k, for the distribution of wireworms in 175 sampling units in each of 
twenty-four irrigated fields (Table 3). The slope of the weighted regression is 1/k, = 0-481. -_ 
1:4 
° Whe 
whet 
TI 
grou 
Whe 
thet 
if a 
e 
0 ! N ‘. | ! Nl i N 
-06 -04 -02 0 0-2 0-4 06 0:8 1:0 ve The 
log @ T 
Fig. 4. Relation of the observed 1 /k, = y’/x’ in each field of Table 3 to logu, where @ is the mean 
a wireworms per sampling unit. The two fields represented by shaded circles are combined 
in Pig. 6, plo 








of 





C. I. Buiss anp A. R. G. OWEN 45 


estimate (k, by equation (17)) usually differs but little from its unweighted trial value (k’ by 
equation (13)). Thus in Table 1, k’ = 13-54 and k, = 13-507. With large differences in %, as 
in Table 2, equation (14) may give a better trial value (k’ = 0-232) than equation (13) 
(k’ = 0-155), as confirmed by the weighted estimate, k, = 0-2413. 

If we ignore the apparent trend of y’/x’ in Fig. 4, we 1aay compute a common k for the 
wireworm counts in Table 3. Here the harmonic mean (k’ = 1-93) of the trial values defined 
by equation (13) (k’ = 2-364) and by equation (14) (k’ = 1-631) led to the initial weighted 
estimate, k, = 2-069. A second approximation gave k, = 2-080, a change of only 0-5% 
from the first weighted estimate. 

(2-4). Grouped distributions. Cases arise with relatively many distributions or samples, 
but sometimes with few individual counts in each component sample, as when the number 
of survivors in each plot of an insecticide test is recorded in two or more equal subplots. 
Thus, in the insecticidal experiment upon leather-jackets reported by Bartlett (1936), the 
number of survivors was counted in two subareas on each plot. Grouping facilitates their 
analysis by reducing the number of entries that need be handled. We shall consider only 
the case where the number of counts JN is constant in each sample. 

The component samples are grouped into approximately equal intervals on the basis of 
the total count U in each plot. Although an average 2’ and y’ is determined for each grouping 
interval, a separate x’ and y’ need not be determined for each individual sample. For 
N >, equation (12) is solved for each grouping interval with four terms: the number of 
samples f, the total count XU in the f samples, the sum of their squared totals XU?, and the 
pooled total sum of squares of the individual counts u or LZu*. Then 








of = Be 
a G ° - 
- , _ N&XXu?-XU?2—-(N—-1)2U aie 
fN(W—1) 
When N = 2, these equations simplify to 
,  wU?-—Xd? , _ Xd?- XU 
x’ = i nee and y’ = — f° (18a) 


where d is the difference between the two counts in each sample. 
The weight for each x’ and y’ is that given in equation (16) multiplied by f. For each 
grouping interval, the mean count is % = LU/(fN) and 
a a ) 
we’ = Gtk (19) 
When N > 3, a trial estimate of k equivalent to that in equation (13) may be computed from 
the totals of each term in the numerators of x’ and y’ in equations (18) or (18a). Alternatively , 
if @ varies markedly, equation (14) may be modified to 


ki = Zf/2(fy'/2’). (20) 


The weighted estimate is frequently intermediate between these two trial values. 

The estimation of k, from grouped distributions is illustrated in Table 4 with the initial 
potato beetle counts within each of the 128 individual plots summarized in Table 1. Each 
plot consisted of N = 10 initial counts and all distinctions between blocks have been 








46 Negative binomial distributions with a common k 


disregarded in grouping the plots. The computed k, = 5-070 agrees well enough with its 
provisional k’ = 5-13 by equation (18), solved with the totals for the series, to need only 
one approximation. 


Table 4. Estimation of k, within 128 plots of N = 10 initial counts of Leptinotarsa, from the 
same data as Table 1; plots growped by size of total count, U = Xu, with f plots in each 

















grouping interval 
| | | 
U f ru | Su xu2 | a’ | y’ Ls ae y’ |x’ 
SN, UPR, RRR WEE 2 
7-17 7 71 171 801 1-000 0-4286 1-014 0-4286 
18-22 7 146 496 3,052 4-057 0-9429 2-086 0-2324 
23-27 9 225 957 5,643 | 5-785 2-348 2-500 0-4059 
28-32 19 571 2,607 17,193 8-530 2-186 3-005 0-2563 
33-37 10 353 1,737 12,477 11-933 1-907 3-530 0-1598 
| 38-42 11 444 2,432 17,934 15-658 2-414 4-036 0-1542 
| 43-47 4 183 1,071 8,377 20-29 1-906 4-575 0-0939 
| 48-52 13 645 4,447 32,027 23-57 5-674 4-962 0-2407 
| 63-57 8 440 3,142 24,216 29-27 4-506 5-500 0-1539 
| 58-62 11 653 5,027 38,779 34-09 5-671 5-936 0-1664 
| 63-67 8 623 | 4,111 34,203 41-79 3-056 6-538 0-0731 
68-72 | 5 352 3,262 24,790 47-84 10-360 7-040 0-2166 
73-82 6 469 4,719 36,699 59-22 11-611 7-817 0-1961 
| 83-92 3 262 3,058 22,902 73-50 19-704 8-733 0-2681 
| 93-102 | 5 484 6,144 46,874 90-51 22-689 9-680 0-2507 
| 
113-132 | 2 242 3,616 29,380 143-13 25-567 12-100 0-1786 
ee os 4 “yl 
| Total | 128 | 6,063 | 46,997 | 355,347 soles ‘ia owet 3-4753 


























k’ = 308,350/60,056 = 5-13 (eqn. 18 solved with totals). wa’ = 102-214f/(%+k’)? (eqn. 19). 
X(wa’?) = 2768-528, X(wa’y’) = 546-1114, k, = 5-070. 

X(wy’?) = 121-0497, BS = 107-7243, y? = 13-3254, n = 14. 

1/k, = 0-1973 + 0-0190, ¢ = 1-960 at P = 0-05. 

Confidence limits: for 1/k,, 0-2345 and 0-1600; for k,, 4-26 and 6-25. 


For a second example, grouping may be applied to the lesion counts on each half of each 
of the first two leaves in 120 potted bean seedlings, as reported by Kleczkowski (1949) in 
experiments with tobacco virus. From the grouped data in Table 5, the ratio y’/x’ = 1/k, 
tended to decrease as the mean increased. The provisional k’ = 16-3 from equation (20) 
gave a computed k, = 17-82 and the next approximation k, = 17-90, which is near the lower 
of the limits of 15 and 84 given by Kleczkowski for his unweighted intercept estimate of 
k= 41. 

The abbreviated calculation with equation (18a) for two counts in each plot is illustrated 
in Table 6 with Bartlett’s (1936) leather-jacket counts from an insecticide experiment in 
randomized blocks. The differences between the two subplots in each plot have been grouped 
on the basis of the plot totals U (Table 7), ignoring differences between treatments and 




















uch 








C. I. Buiss anp A. R. G. OWEN 


between blocks. Starting with the unweighted estimate of k' = 14-26, weighting gave 
k, = 19-00, which was changed in a second approximation to k, = 18-32. 

(2-5). Tests for agreement with a single k,. Agreement with a single k, may be judged 
initially by an overall x?, which follows from the weighting of each contribution inversely 
as its variance. A column of wy’ is added to the work-form, most easily as the product of 
(wa’) (y’/x’) = wy’, and the products of y’ x wy’ are summed over the g distributions in the 


47 


Table 5. Estimation of k, with grouping ; numbers of lesions of tobacco necrosis virus 
on N =4 half-leaves of 120 bean seedlings (Kleczkowski, 1949) 
































| 
| | From initial counts u 
| Class | No. of 
| limits | plants 
| a | f =U =U? Dru? 
-—_ 
| | 
i. | 4 246 17,066 5,048 
35 3 330 36,324 9,966 
45 4 671 113,183 30,833 
55 6 1,204 242,472 65,570 
65 ll 2,634 632,576 167,428 
10 2,769 768,337 206,323 
re 15 4,847 1,567,991 | 408,801 
95 10 3,597 1,295,857 | 335,637 
105 9 3,547 1,398,947 | 373,365 
115 | 4 1,722 741,506 193,922 
125 4 1,907 909,229 237,041 
9 4,635 2,387,721 615,801 
an 6 | 3,336 | 1,856,240 | 483,038 
155 8 4,818 2,902,168 755,832 
175 7 4,550 2,960,868 762,440 
205 6 4,504 3,388,422 865,206 
275 4 3,801 3,635,873 | 948,159 
| 
| | | 
Total 120 49,118 | 24,854,780 | 6,464,410 

















Variates 
Lesions 
per leaf | | cyt 
a x’ | y 
| 

15-38 250-4 49-75 

27-50 732-2 70-83 

41-94 1,715-6 169-50 

50°17 2,457 224-9 

59-86 3,524 221-5 

69-22 4,683 405-4 

80-78 6,440 292-6 

89-92 8,002 299-2 

98-53 9,496 776-6 
107-62 11,408 604-5 
119-19 14,004 692-0 
128-75 16,407 570-2 
139-00 19,072 915-3 
150-56 22,358 1,111-5 
162-50 26,172 895-7 
187-67 35,045 817-9 
237-56 55,994 3,028-3 





0-19868 
0-09674 
0-09880 
0-09153 
0-06285 
0-08657 


0-04543 
0-03739 
0-08178 
0-05299 
0-04941 
0-03475 


0-04799 
0-04971 
0-03422 
0-02334 
0-05408 

















First estimate: trial k’ = 120/7-38161 = 16-3 (eqn. 20). 


A = 386-59 (eqn. 15). 


X(wa’?) = 32,042-0, X(wa’y’) = 1798-02, k, = 17-821 (eqn. 17). 
Second estimate: 1 = 463-24, X(wax’?) = 37,339-3, X(wa’y’) = 2085-46, k, = 17-905 (eqn. 17). 
dU(wy’?) = 141-204, B§ = 116-476 (eqn. 21), xy? = 24:729, n = 15 (eqn. 11). 
Xw = 0-00879575, [wa’?] = 31,439-3, [wa’y’] = 1495-26, [wy’?] = 82-1657 (eqn. 22). 
B? = 71-115 (eqn. 24). C = 59-039 (eqn. 23). 

















Effect of D.F S.S, M.S. F 
Slope, 1/k, 1 116-476 116-476 147-57 
Computed intercept against 0 1 13-678 13-678 17-33 
Error 14 11-051 0-7893 — 























48 Negative binomial distributions with a common k 


series to obtain X(wy’?). Part of this total, with 1 D.¥., can be attributed to the slope of the 
line fitted with a zero intercept, 

B= X?(wa'y’)/X(wa’?). (21) 
The remainder is an approximate y*, x? = X(wy’*)— B2 (equation (11)), nominally with 
g—2p.¥. Unlike k,, x? is relatively sensitive to a discrepancy between the estimated k, 


and the provisional k’. Before calculating the column of wy’, these should agree substan- 
tially, which may require computing an additional cycle. 


Table 6. Estimation of k, from subplot counts (N = 2) in an experiment in randomized blocks 
for the control of leather-jackets (Bartlett, 1936); for plot totals U, see Table 7 


Differences between u within plots, d 


















































% of toxicant 
Block oe 
no. 
0 0 0-2 0-4 0-5 0-6 
1 26 6 3 5 4 9 \ 

2 12 0 5 2 3 1 

3 8 3 3 2 8 3 

4 22 19 9 3 0 0 

5 5 20 3 0 + 3 

6 34 14 8 12 6 1 

LU? = 85,256 (from Table 7), Xd? = 4176. 
Estimation of k, within plots from grouped data 
i f au x’ | y’ y’ [a’ First wx’ | Second wz’ 

0O- 9 5 3-10 9-40 0-400 0-04255 1-6888 1-8492 
10-19 9 6-83 45-89 —0O-111 — 0-00242 2-0598 2-4366 
20-29 7 12-36 144-] 9-00 0-06244 1-0056 1-2858 
30-49 5 21-90 475-6 — 2-40 — 0-00505 0-3893 .0-5399 
50-69 6 30-08 859-8 70-00 0-08141 0-3107 0-4499 
80-99 2 43-25 1792-5 128-0 0-07141 0-0616 0-0932 
120-139 2 63-50 3839-5 346-5 0-09025 0-0337 0-0531 
































First estimate: k’ = 81,080/5684 = 14-26 (eqn. 18). wx’ = 101-80f/(%+k’)? (eqn. 19). 
X(wa’*) = 947-46, X(wa’y’) = 49-874, k, = 18-997 (eqn. 17). 
Second estimate: k’ = 19-00, wa’ = 180-63f/(u+k’)?, k, = 1329-09/72-567 = 18-315; x? = 1-875, 
m=65(eqn.1l). | 
Confidence limits: for 1/k,, 005460 + 0-05376; for k,, 9-23 and 1190. 


For a second test of validity, an intercept component with 1D.¥. may be split off from 
the approximate x’. It measures the difference between two straight lines, one fitted with 
and the other without the constraint of a zero intercept; the remainder provides an empirical 
error for testing the significance of the difference. An additional term is needed, the sum 
of the weights Zw, which may be obtained by successive cumulative division of wx’ by 2’. 

















C. I. Briss anp A. R. G. OWEN 49 


The weighted sums of squares and products for the g series are then reduced to deviations 
about their means, rather than about zero, by computing 


[wa’?] = X(wa’?) —X2(wa’)/Xw, 


[war'y’] = X(wa'y’) —X(wa') d(wy’)/Zw, (22) 
and [wy] = U(wy") —C, 
where C = X?(wy’)/Zw. (23) 


The variation attributable to the slope of the line passing through the origin, B3, is defined 
in equation (21); that accounted for by a slope without this constraint is 


BP = [wa'y'P/[we'}. (24) 


The required test may be arranged as an analysis of variance: 











Effect of D.F. S.S. M.S. F 
Slope, 1/k, 1 Bs BS B5/s? 
Computed intercept against 0 1 C+B?—Bg Iy I,/s? 
Error g-3 [wy’?]— B? 8? — 























If a single k, is justified, the F value in the first row should be clearly significant and that 
in the second row not significant. A significant F in the second row indicates a progressive 
change in k. The sum of squares for error is in effect an approximate x? test of the homo- 
geneity of the component distributions, after allowing for a linear trend of y’ on x’ with a 
non-zero intercept. 

Both tests are illustrated in Table 1. By cumulative multiplication of the columns for 
y' and wy’, X(wy’?) = 49-5943, of which B2 = 30-3806 could be attributed to the slope of the 
fitted line, the remainder being x? = 19-214 with 14D.¥F. k, could be accepted at once as 
valid, but since the approximate y? exceeded its degrees of freedom, the zero intercept has 
been tested. By accumulative division of each wx’ by its corresponding x’, we obtain 
Lw = 0-0063082, required in computing [wa’?], [wa’y’] and [wy’?] with equations (22) and 
from them B? = 14-4831, the variation attributable to a slope not constrained to pass 
through zero. The resulting analysis of variance at the bottom of the table reveals no 
trend which would make a zero intercept untenable. 

The catches of haddock (Table 2), the number of potato beetles within plots (Table 4) 
and the subplot counts of leather-jackets (Table 6) were equally consistent with a common 
k,, as judged by their respective y”’s. In contrast, the wireworm counts in Table 3 and the 
viral lesions in Table 5 were characterized by significant non-zero intercepts, with the 
residual variation far in excess of expectation for the wireworms, but well within the 
sampling error for the viral lesions. In both examples, 1/k decreased approximately 
linearly as log % increased, the unweighted regression in Fig. 4 accounting for 68 % of the 
variation in 1/k,, with b = — 0-626 + 0-091, and in Fig. 5, where each point was weighted by 
the number of seedlings (f), for 67% of the variation, with 6 = —0-111+0-020. When 
log (1/k,) was plotted against log w, the two linear regressions accounted respectively for 
66 and 62% of the variation, but were sufficiently sensitive to small values of 1/k,, that the 
counts in Table 3 for fields nos. 14 and 16 with similar %’s have been combined. Neither 

4 Biom. 45 








50 Negative binomial distributions with a common k 


regression in Fig. 6 differed significantly in slope from — 0-5 (6 = — 0-409 + 0-065 for wire- 
worms and } = —0-629+ 0-127 for viral lesions). Either regression would discredit the 
suitability of a single, unqualified k,,. 

Heterogeneity in k may be due to other sources. In data from an insecticidal field test, 
the treated series may have a relatively stable k that differs from that for the untreated 





0-:20-— ° 














0 
1:0 1:2 1:4 16 1:8 2:0 re 2-4 
log U 
Fig. 5. Relation of the observed 1/k, = y’/x’ to log% from the grouped distributions of viral lesions in 
Table 5, where % is the mean number of lesions per half-leaf in each grouping interval. 


















log U 
—$4 =92 0 02 0-4 06 08 10 1? a 
i 414 
Wireworms 

40-9 

t3- 

07 
=14h ¢ 
s Jos? 
So9+ 

0-3 

0-7; 
Viral lesions 
0-5 
0:3 J ! | | | | : | 
10 12 1-4 16 18 2:0 2:2 2-4 


log U 
Fig. 6. Relation of 1/k, in logarithms to log@ for the data in Figs. 4 and 5. The fitted regressions (solid 


lines) do not differ significantly from a slope of b = 0-5 (broken lines). The shaded circle in the wire- 
worm plot represents two fields and has been given double weight in fitting the line. 





contro 
seedlit 
total : 
‘wild’ 
heterc 
be no 
balan 
weigh 

(2-€ 
the s 
we ha 


C. I. Buiss anp A. R. G. Owen 51 


rire- controls but can be used in comparisons of the treated plots. Quadrat counts of hemlock 
the seedlings in plots exposed to sunlight during part of the day had a different & from those in 
total shade (Olson, 1954). In some cases, heterogeneity can be traced to an occasional 
est, ‘wild’ value, which, if justifiably an outlier, may be omitted in computing k,. When the 
ted heterogeneity persists and a single k is warranted by the F test for B, the weighted k, may 
be no better as an estimate than the harmonic mean of the individual unweighted k’s. To 
balance the weighting appropriate for a homogeneous series and equal weighting, semi- 
weights or partial weights may be considered (Cochran, 1954). 

(2-6). Precision of k,. Since the distribution of 1/k, is the more nearly symmetrical, 
the standard errors of k, are computed in terms of 1/k,. By usual regression theory 
we have for the variance of the slope 


V(1/k,) = 1/2(wa'), (25) 


when the data are consistent with a common k as judged by x. If yx? with n D.¥. exceeds its 
expected value at, say, P<0-1, but the regression meets the test for a zero intercept, an 
approximate variance may be computed as 


V (Ike) = x2/{n¥ (wa), (25a) 


although the weights used in computing 1/k, are now of doubtful validity. If its variance is 
computed with equation (25), confidence limits for 1/k, are determined with the square root 
of the above variance multiplied by the normal deviate (i.e. Student’s ¢ for n = 00) for the 
selected level of probability, and then inverted to obtain the limits for k,. If the variance of 
s in the slope is increased by y?/n (equation (25a)), Student’s ¢ is that for the degrees of freedom 
in x, and the limits are approximate at best. Confidence limits for homogeneous series are 
illustrated in Tables 1, 2, 4 and 6. 


3. A COMMON k FROM A TEST FOR ADDITIVITY 


Not infrequently, the results of a field experiment are recorded as counts and evaluated by 
an analysis of variance. Although sometimes computed in terms of square roots, which 
would stabilize the variance of a Poisson-type count, the data are commonly over-dispersed 
and need a transformation appropriate to the negative binomial. This requires an estimate 
of k,. Most experimental designs, however, lack the basis for a regression estimate of k,, 
so that we need a new approach. 

Two transformations have been proposed for stabilizing the variance of negative binomial 
distributions, both depending upon an estimated common k. The simpler of these and Ae 
only one considered here, is the logarithmic transformation described by Anscombe (1948), 
where y = log (w+ 4k). A somewhat more effective transform is that given by Beall (1942), 
y = Jksinh-,/u/k, which he has tabled for different values of w and of 1/k. When k is 
known, either transformation should stabilize the variance effectively and, judging from 
experience, lead concurrently to additivity. Of the two objectives, additivity is often 
deemed the more important and, moreover, can be tested conveniently. This suggests 
reversing the usual order, and by successive approximation, selecting that value of k giving 
zero non-additivity in terms of the log-transform. 

id (3-1). A test for non-additivity. Our test is that described by Tukey (1949, 1955) for a type 

re- of systematic non-additivity occurring in experiments in randomized blocks, Latin squares 

and other designs. In a cross-classification, for example, a mean square for non-additivity, 
4-2 











52 Negative binomial distributions with a common k 


with 1 D.¥. is isolated from the interaction of rows by columns and compared by an F ort 
test with the mean square from the remaining interactions. The non-additivity represents 
the regression of the random element in each cell of the table upon the product of two 
deviations from the general mean, that of the column mean and that of the row mean. Ify 
is an individual measurement, each random element e is defined as e = y—¥,—¥% +7, 
where %, is the mean of its block, 7, of its treatment, and 7 is the general mean. The corre- 
sponding product of the deviations of the two means for each cell is computed as 
x = (¥,—Y) (¥,—Y). The variation accounted for by the slope, B’ = [xe]/[x*], is the sum of 
squares for non-additivity, B3, = [we]?/[x?]. 

Plotting each e against x enables one to see what is happening, but by sacrificing this 
information, we can simplify quite materially the calculation for randomized groups or 
blocks (Bliss & Calhoun, 1954). The standard table of y with its usual marginal totals, 
T, for the f blocks and 7} for the h treatments, and its overall total of 7’ = Xy for N = hf 
counts, is augmented by a column either of &(7,y), or of &(Zy), for calculating the sum of 
products &(7%, Ty). The total of this extra column of products is equal to 27} or to 27? 
respectively. The remaining calculation is primarily that for a standard analysis of variance 
for randomized blocks. It is summarized in the following work-form, where 


S, = X(T, Ty) —T(S,+ S, +4 Cn)- 
The regression of e upon the x then has the slope B’ = S*/S,8S,. 











| 
Row Term D.F. S.S. M.S. | F 
1 Blocks f-1 =T%/h—Cy, = Ss aig aie 
2 Treatments h-1 xT3/f—-C,, = S, _— | — 
3 Non-additivity 1 S3/(S,S,N) = Bi Bp By/s* 
4 Error N-f-h Remainder 8? —_ 
5 Total N-1 Ly?-C,, = [y*] S, = X&(7,T,y) 
—T(S,+8,+Cn) 
6 Correction 1 T?/N = Cn 























The short test for non-additivity has been applied in Table 7 to the initial counts per plot 
of leather-jackets, which are designated as Xw = U in the regression analysis of Table 6 
but for this purpose will be called y. The sums of products 


X(T. y) = 92 x 501 + 66 x 376+... +25 x 59 = 80,858. 


The total of this additional column is equal to 277, which serves as a check. With the sums 
of squares in rows 1, 2 and 6 of the analysis of variance in Table 7, computed as in the above 
work-form, we can compute S, and the sum of squares B% for non-additivity in row 3. 
The remaining interaction of blocks by treatments provides the error variance s?. In the 
present case, 1 p.F. between treatments represents a dummy comparison between duplicate 
control plots, which in a later analysis we will transfer to the error. It is clear from the very 
significant F in row 3 (P < 0-005) that these original counts are not in suitable terms for an 
analysis of variance. 

As a second example, eight dummy treatments have been assigned at random to the eight 





plots ir 
the ori 
randor 
additin 





oe 645 ND ee 





for n 
of re 
in th 

(3° 
$k m: 
wher 
value 
simil 
is no 
evide 





ot 





C. I. Briss anp A. R. G. OwENn 53 


plots in each block of the data underlying Table 1 and tested for non-additivity, substituting 
the original counts (w) of the potato beetles in each plot for y in the work-form. With the 
randomization tested in Table 8, the original counts did not meet the requirement for non- 
additivity. Ten additional randomizations of the same original counts gave variance ratios 


Table 7. Test for non-additivity applied to the total count U (= y) 
of leather-jackets in each plot (Bartlett, 1936) 
































y at each % of toxicant 
Block 
= T 
on. r) (Ty) 
0 0 0-2 0-4 0-5 0-6 
ae a | 
1 92 66 19 29 16 25 247 80,858 
2 60 46 35 10 ll 5 167 56,495 
| 3 46 81 17 22 16 9 191 61,300 
4 120 59 43 13 10 2 247 93,059 
5 49 64 25 24 8 7 177 57,345 
6 134 60 52 20 28 11 305 105,127 
T, 501 376 191 118 89 59 1334=T 454,184= D7? 

















S,+5,+C,, = 78,055, S, = X(T, Try) — T(Sp+S:+ Om) = 2,188,160, NS,S; = 2,229,810,016; 


O = 9 = 1334/36 = 37-06, 2 = 1367-6, y’ = 173-54, k, + 7-881. 



































| 

| Row Term D.F. S.S. M.S. F 
1 Blocks 5 2,358=S, 472 2-24 
2 Treatments 5 26,265=S, 5,253 24-94 
3 Non-additivity 1 2,147 = BY 2,147 10-19 

| 4 Error 24 5,054 210-6=s? — 
5 Total 35 35,824 
6 Correction 1 49,432=C,, 

ESR OC ee i ae ee es | Lee kd ee Lee elerete ae PMO Ore oe See Cae ae ee 








for non-additivity ranging from F = 0-08 to 8-54, with a median F = 2-32. In the absence 
of real differences between treatments, the test is relatively sensitive to chance variations 
in the arrangement of plots within blocks. 

(3-2). Provisional k for transforming counts. Given the above test procedure, the required 
4k may be computed, by successive approximations, as the value giving zero non-additivity 
when the count in each plot (uw) is transformed to y = log(w+ 4k). Occasionally, the trial 
value for starting the iteration can be based upon past or concurrent experience with other 
similar tests, or a well-founded a priori k accepted without change—if the non-additivity 
is no larger than the residual error. More often, the initial k’ must be computed from the 
evidence of each experiment. Two provisional estimates may be suggested. 








54 Negative binomial distributions with a common k 


For the first, separate counts are recorded on two or more equal subsections of each plot, 
leading to a trial k’ based upon subsampling within plots. An intraplot /, can then be com- 
puted by regression, with grouping where desirable, as has been described. Further analyses, 
however, are based upon the sum of the subsamples in each plot, so that we require not an 
intraplot but an interplot %,. Their relation depends upon the correlation between adjacent 
subplot units. If half-plots, say, were completely correlated, so the r = 1, both k’s would be 
the same. This condition was approximated in counts of Lespedeza in an old meadow 
(F.C. Evans, 1952). When the same counts were combined successively in quadrats doubling 


Table 8. Test for non-additivity of the original plot counts u from the uniformity data in 
Table 1 when dummy treatments 1-8 were assigned at random to plots in each block 


Analysis of variance 























| 
Term D.F. S.S. M.S. | F 
| 
Pk id dete in » EL eeee wee i+ 
Blocks 15 41,840 |  2,789-3 | 12-64 
Treatments 7 1,791 255-9 1-16 
Non-additivity, BY 1 1,572 1,572 | 712 
Error 104 22,957 220-74 =? | _ 
aoa Te a Se 8 De Ree Reed | BEN | 
Total 127 68,160 afer / oo 
Correction 1 287,187 — 
| 








S,+S,+Cm = 330,819, S, = 3,883,407, S,S,N = 9,594,121,743; @ = 6063/128 = 47-367, a’ = 2241-9, 
y’ = 173-37, k, = 12-931. 


in size from jg to 1 m.?, the estimated k’s did not change with size. A map of the meadow 
showed considerable sections lacking this plant, which led to a high correlation between 
adjacent quadrats. At the other extreme, a zero correlation between adjacent half-plots 
would give a plot k, twice that of the half-plot k,. Most cases will fall between these extremes, 
so that k, from the subplot analysis should give an approximate lower limit and Vk, an 
approximate upper limit to the k, between plots. 

In the counts of Leptinotarsa, k, = 5-07 within plots (Table 4) and k, = 13-51 between 
these same plots within blocks (Table 1). Since each plot total was based upon N = 10 
subplot units, k, between plots should be intermediate between k, = 5-07 within plots and 
Nk, = 10 x 5-07. The observed k, between plots (13-51) falls within this range. 

In a randomized block or Latin-square experiment, a provisional k, may be estimated 
alternatively from an analysis of variance of the original single or total count w within each 
plot. After removing the non-additivity, the mean square for the residual error gives a 
variance s*, and all of the observations an overall mean %, from which we can compute 
x’ and y’ in equation (2) and a trial value of k,. 

The calculation is illustrated in Table 8 with the dummy experiment on potato beetles. 
From the mean number of beetles per plot, 7 = 6063/128 = 47-367, and the error variance 
from the analysis of variance, s? = 220-74, we find, by equation (2), k, = x’/y’ = 12-93, 
which compares favourably with the regression estimate from the same data of k, = 13-51 
(Table 1). Unlike B% for non-additivity, this estimate of k, proved relatively insensitive 











C. I. Buiss anp A. R. G. OWEN 55 


to chance differences in plot assignment within blocks, varying within a range of k, = 11-01 
and 13-47 in eleven dummy experiments superimposed upon the same original counts. 
Trial estimates for the leather-jacket counts in Table 7 can also be computed by two 
procedures. From the regression estimate of the intraplot k, = 18-32 and N = 2 (Table 6), 
an interplot k, between 18 and 36 would be anticipated, apart from the relatively wide 
confidence limits of k, (9-23 and 1190). Alternatively, the mean count (%) and error mean 
square (s”) of the plot counts in Table 7 gave k, = 7-88, less than the lower limit predicted 


Table 9. Terms required in approximating 4k for zero non-additivity with 
the transform y = log (U + 4k), from the data in Table 7 






























































| 
hk’ C.. S, S, | Ss B, 
9 91-3144 0-1719 2-2490 —0-10891 —0-02919 
ll 92-0320 0-1475 2-0394 — 0-05929 —0-01801 
14 96-3342 0-1260 1-7538 0-04052 0-01437 
12-6 94-3488 0-1384 1-8719 — 000433 —0-00142 
Table 10. Plot counts of surviving leather-jackets in Table 7 with 
metameter y = log (U + 4k) giving zero non-additivity 
| y = log(U + 12-6) for dose 
wire : eee ie 2(Ty) 
no | | | 
0 | 0 | 0-2 04 | O58 | 06 
oan. 5 Meeks weet A Tie ee 
| | | | | 
 y | 2-02 | 1-90 1-50 1-62 1-46 | 1-58 10-08 99-3864 
2 | 1-86 | 1-77 | 1-68 1:35 | 1-37 | 1-25 9-28 91-9808 
3 | 1-77 1-97 | 1-47 | 1-54 1:46 | 1-33 9-54 94-2321 
4 2-12 | 1-85 1-75 | 1-41 1-35 1-16 9-64 96-2643 
5 | 1-79 | 1-88 1-58 | 1-56 1-31 | 1-29 9-41 93-1095 
6 | 2°17 | 1:86 | 1-81 | 1-51 | 1-61 | 1-37 10-33 102-3509 
| | | 
| | | | Me ne Mao ee 
a Prarie Ca ran | 
T, | 11-73 | 11-28 979 | 899 | 856 | 7-98 58-28 577-3240 
| | | | 

















from the intraplot variation. When the two estimates differ so widely, the regression tech- 
nique would be given greater weight in selecting an initial trial value, say, of $k’ = 9 for 
starting the iterations. 

(3-3). Estimation of k from the log counts. Whatever its source, the initial trial value of 
$k’ is added to each plot count; the logarithm of the sum is our metameter y = log (U + $k’). 
With the work-form for non-additivity, we compute the sums of squares in rows 6, 1 and 2, 
their sum (C,,+S,+,), S,, and the test criterion 


By, = Sx/V(S,8,N), (26) 
where N is the total number of plots or counts. If B, is negative, $k’ is increased; if positive, 


}k’ is decreased. The calculation is then repeated with a new series of y’s. The objective is to 
find two values of $k’, one giving a small positive B, and the other a small negative B,, 








56 Negative binomial distributions with a common k 


from which the value of $k’ giving B, = 0 can be interpolated. Empirical trial suggests 
that the average of two interpolated estimates, one based upon $k’ and the other upon 2/k’, 
is often nearer to the desired value than either direct or harmonic interpolation alone. 

The estimation of 44 by minimizing non-additivity in the transformed counts is viewed, 
primarily, as a device for finding a suitable metameter to be used in analysing a series of 
counts. In some cases the method has led to negative values for $k’. Although the form of the 
underlying distribution may then be in doubt, the transformed data, conforming to the 
basic assumption of additivity in the analysis of variance, may serve the purposes of the 
experimenter quite as well as if 4k’ were positive. Applications of the technique to field 
experiments on insecticides have been promising, especially in one case where the weights 
for a further probit analysis of several dosage-mortality curves were based upon the 
interpolated k’ (Bliss, 1958). 


Table 11. Analysis of variance of transformed counts in Table 10 





























Term D.F. S.S. M.S. F 
Blocks 5 0-1384=S, 0:02768 — 1-94 
Treatments 5 1-8719=S, — — — 
Control against treated 1 1-5606 1-5606 — 109-59 
Linear on dose 1 0-2882 0-2882 -= 20-24 
Non-linear 2 0-0023 0-00115 — 0-08 
Dummy 1 0-0208 0-0208 1-49 —_— 
Non-additivity 1 0-000002 —- a 0-00 
Initial e:ror 24 0-3353 0-01397 1-00 — 
| Total 35 2-3456 — —_ oes 
Correction 1 94-3488=C,, - — —_ 
| Experimental error 25 0-3561 0-01424 — 1-00 
| 

















S,+5,+C» = 96-3591, S, = —0-004334, /(S,S,N) = 3-05394, B, = —0-001419. 


The estimation of $4 in an experiment on insecticides may be illustrated with the leather- 
jacket counts in Table 7. As discussed above, we may start with $k’ = 9 as our first trial 
value, converting each of the N = 36 counts (U) to y = log(U+9) and computing B, 
for non-additivity. The calculation is continued with successive trial values of $k’ until 
with $k’ = 11 and 14 the corresponding B,,’s are small negative and positive values (Table 9). 
The required 4k’ corresponding to B, = 0 has been interpolated between 11 and 14 as 
$k’ = 12-67, and again between 1/11 and 1/14 as $k’ = 12-49, the two estimates averaging 
$k’ = 12-6. The final variates, y = log (U + 12-6), are shown in full in Table 10. The resulting 
B,, = —0-00142 approximates zero so closely that the sum of squares for non-additivity 
vanishes in the analysis of variance in Table 11. Two plots in each block represented 
untreated controls. The ‘dummy’ comparison between them has been added to the initial 
error, with 24D.¥., to obtain the experimental error in the last row. The largest contrast, of 
course, is the difference in survival between the controls and the treated plots, followed by 
the linear trend of the metameter against the concentration of insecticide. No other treat- 
ment comparison approached significance. 





mm a= ae © OC, 








C. I. Buiss anp A. R. G. OwEN 57 


To gain some idea of the variation in $k’ when determined by minimizing B,, in the absence 
of differences between treatments, we have analysed similarly the uniformity data in 
Table 1, when eight dummy treatments were assigned at random to the eight plots in each 
of the sixteen blocks. In eleven independent randomizations of the same data, the fluctua- 
tion in the variance ratio for B2, when computed from the initial counts wu, without trans- 
formation, has already been noted. Even greater variability appeared in the values of $k’, 
ranging in the eleven ‘experiments’ from — 4-8 to more than 120 but with a median of 
4k’ = 7-4, in good agreement with k, = 13-51 in Table 1. 

The wide range in 3k’ from identically the same counts in different randomized com- 
binations indicates the sensitivity of this estimate to random variation when there are no 
real differences between treatments. Even when the plot counts differ markedly with 
treatment, quite different values of $k’ may have relatively little effect upon comparisons 
between treatments. The leather-jacket counts in Table 7, for example, when analysed with 
$k’ = 5gave F = 0-51 for non-additivity instead of F = 0-0001 with $k’ = 12-6, but the three 
treatment effects yielded variance ratios of F = 105-82, 24-19 and 0-26, quite similar to 
F = 109-59, 20-24 and 0-08 for the same comparisons in Table 11. In practice, an exact 
determination of the 4k giving B, = 0 is probably unnecessary. Any value for which 
Bz <s? should meet the needs of the experimental biologist. 


4, SUMMARY 


The most widely applicable of the over-dispersed distributions, the negative binomial, is 
defined by the arithmetic mean and a parameter k. Comparisons between the means of two 
or more distributions are more direct and unequivocal if they have the same relative dis- 
persion in terms of k. Two approaches to a common k are described and illustrated with 
numerical examples. 

The first is a regression moment estimate that is applicable when the relevant variation 
can be sampled. Two statistics, x’ and y’, are computed from the mean and variance of each 
component distribution, such that their ratio, y’/x’ = 1/k,, is an estimate of 1/k and the 
difference (y’—x'/k) has zero expectation. Given two or more component distributions, 
a common 1/k, can be estimated by successive approximations from the slope of y’ upon 2’, 
when the regression is constrained to pass through the origin and each y’ is weighted by its 
invariance. Agreement with a single k, can be judged by a x? test of the variation about the 
regression, by agreement with a zero intercept, and by independence between 1/k, from the 
component samples and their count means w. 

A second estimate of the common k is proposed for field experiments arranged in ran- 
domized blocks or other restricted designs and evaluated by an analysis of variance in terms 
of Anscombe’s transform, y = log (w+ 4k), of the plot counts u. It is proposed to estimate 
the k in this transformation, again by iteration, as that value giving zero non-additivity 
(B, = 0) in the regression test described by Tukey. A simple form of the calculation is 
described for randomized blocks. Since the resulting k provides the biologist with an additive 
metric for analysing his experiment, its possible limitations as an estimate are considered 
secondary. 


This project was undertaken and much of it completed in 1953, when the senior author 
was a guest of the Department of Genetics at Cambridge University, on leave from the 


PST Neneey 








58 Negative binomial distributions with a common k 


Connecticut Agricultural Experiment Station and Yale University. The authors are 
grateful to Sir Ronald A. Fisher, John W. Tukey, Frank J. Anscombe and the referee for 
their suggestions and assistance. 


REFERENCES 


AnscoMBE, F. J. (1948). The transformation of Poisson, binomial and negative-binomial data. 
Biometrika, 35, 246-54. 

AnscomBE, F. J. (1949). The statistical analysis of insect counts based on the negative binomial 
distribution. Biometrics, 5, 165-73. 

ANSCOMBE, F’. J. (1950). Sampling theory of the negative binomial and logarithmic series distributions, 
Biometrika, 37, 358-82. 

Bart eTT, M. 8. (1936). Some notes on insecticide tests in the laboratory and in the field. J.R.Statist. 
Soc. Suppl. 3, 185-94. 

BEALL, GEOFFREY (1939). Methods of estimating the population of insects in a field. Biometrika, 30, 
422-39. 

BEALL, GEOFFREY (1942). The transformation of data from entomological field experiments so that 
the analysis of variance becomes applicable. Biometrika, 32, 243-62. 

Buss, C. I. (1958). The analysis of insect counts as negative binomial distributions. Proc. Tenth 
Int. Congr. Entom., 1956. 

Buss, C. I. & Catnoun, D. W. (1954). An Outline of Biometry. New Haven: Yale Co-operative Corp. 

Buss, C. I. & Fisuer, R. A. (1953). Fitting the negative binomial distribution to biological data and 
note on the efficient fitting of the negative binomial. Biometrics, 9, 176-200. 

Cocuran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 
101-29. 

Evans, D. A. (1953). Experimental evidence concerning contagious distributions in ecology. Bio- 
metrika, 40, 186-211. 

Evans, F, C. (1952). The influence of size of quadrat on the distributional patterns of plant popula- 
tions. Contr. Lab. Vertebr. Biol. Univ. Mich. no. 54, 1-15. 

FisHEer, R. A. (1941). The negative binomial distribution. Ann. Eugen., Lond., 11, 182-7. 

Jones, E. W. (1937). Practical field methods of sampling soil for wireworms. J. Agric. Res. 54, 123-34. 

KiEczKkowskI, A. (1949). The transformation of local lesion counts for statistical analysis. Ann. Appl. 
Biol. 36, 139-52. 

Morris, R. F. (1954). A sequential sampling technique for spruce budworm egg surveys. Canad. J. 
Zool, 32, 302-13. 

OAKLAND, G. B. (1950). An application of sequential analysis to whitefish sampling. Biometrics, 6, 
59-67. 

Otson, J. 8. (1954). Germination and survival of eastern hemlock seedlings in Connecticut seedbeds. 
Abstract Bull. Ecol. Soc. Amer. 35, 60. 

Taytor, C. C. (1953). Nature of variability in trawl catches. U.S. Fish Wildlife Service, Fishery Bull. 
54, 145-66. 

TuxKeEy, J. W. (1949). One degree of freedom for non-additivity. Biometrics, 5, 232-42. 

Tuxey, J. W. (1955). Answer to query 113. Biometrics, 11, 111-13. 

Waters, W. E. (1955). Sequential sampling in forest insect surveys. For. Sci. 1, 68-79. 





In 





or 


ial 





[ 59 ] 


SIMPLIFIED METHODS OF FITTING THE TRUNCATED 
NEGATIVE BINOMIAL DISTRIBUTION 


By W. BRASS 
University of Aberdeen 


1. [LyTRODUCTION 


The negative binomial distribution is frequently used to fit sample data. In some circum- 
stances the sample may be truncated at the lower end because the number of observations 
in the class with zero measurement cannot be isolated. Sampford (1955) gives as an example 
the distribution of breaks in irradiated cells which are at a particular stage of the mitotic 
cycle; cells not susceptible to breakage cannot be distinguished from susceptibles in which 
no break occurs. A further example from demographic research has recently been presented. 
Brass (1957) has shown that, in some circumstances, the number of children born per woman 
in a cohort of completed fertility, where all the women have been exposed to risk, is 
distributed, to a good approximation, in the negative binomial form. In most populations, 
it is not possible to sample only the women exposed to risk and the zero class, therefore, 
cannot be accurately determined. The distribution of children per mother, however, follows 
the truncated negative binomial form. 

Sampford (1955) has given methods for estimating the parameters of the truncated 
negative binomial distribution, by the use of the first two sample moments, and also from 
the maximum likelihood equations. By these methods the parameters are obtained, in 
each case, from the solution by successive numerical approximations of two equations. The 
solution of the moment equations is fairly laborious and that of the maximum likelihood 
ones considerably more so. 

This paper considers simplified methods of fitting the truncated negative binomial dis- 
tribution. Reasonably efficient estimates, which are easily calculated, will always be 
useful (a) for exploratory work when it is not clear which type of distribution should be 
fitted, and (b) to provide first-stage values in the iterative solution of the maximum likeli- 
hood equations. In some instances, even in the final stages of an investigation, the simplified 
method will be all that is needed. Whether the extra work required to find maximum 
likelihood estimates is justified by the gain from the increase in precision can only be 
decided by a balancing of advantages in each particular case. 


2. Mretuop A. MopIFICATION OF EQUATIONS FOR ESTIMATION BY MOMENTS 


In Fisher’s (1941) notation, the truncated negative binomial distribution has the form 


we (k+r—1)! , 
l—wk (k—V)!r! 





P(r) = (r = 1, 2, ...), (1) 


where 7 = l—w. 
The factorial moments are 
» _ __(k+j—-1)!9 
Mal = (k=)! wi — we)’ 











60 Fitting the truncated negative binomial distribution 


and the first two moments about the origin 


,___ » _ ky(1 + ky) 
f= 1 — wk)’ Ha = 1 —we)’ (2) 





We will also write for the proportion in the first class of the truncated distribution 
P=P(l)=;_3- (3) 


The main difficulty in solving te equations (2) for the moment method of estimating the 
parameters comes from the component w* in the above expressions. If the third moment is 
used to eliminate w* from the relations, estimates of w and k in terms of the first three sample 
moments are found very easily, as pointed out by David & Johnson (1952). However, as 
these authors emphasize, the method is very inefficient because of the weight given to the 
third sample moment. 

Elimination of w* from the « ,. :tions can be achieved by many methods which do not 
involve the third moment. Tle simplest seems to be by the use of the proportion in the 
first class of the truncated distribution. If the expression for w* from equation (3) is sub- 
stituted in (2) we obtain the following equations for the parameters in terms of the popula- 


tions moments and P 
wi,—P 


l-—w 





w=4-P), k= (4) 


where o” is the second moment about the mean. 
Replacement of the moments and P by sample values leads to very simple estimates of 
w and k, mn ‘ _— oe 
o-3(-2). 


~ g2 n 1-—w 


, (5) 


where the bars denote the estimates, n; is the number of sample observations with measure- 
ment ¢, and ” the total sample number. m and s? are the unbiased sample estimates of 1; 
and o?, 


respectively. This is called, for convenience, Method A. 

w and k are consistent estimates of w and k but are not unbiased. In addition it is easy to 
construct samples for which the equations give no acceptable solution for W or k (e.g. when 
n, = 0, m>s?). These features apply to estimates from the first two moments, and the 
maximum likelihood equations also. When n is large the effect of bias will be slight and 
samples for which equations (5) cannot be solved will be very improbable. 


3. ErFriciIeNcy oF Metuop A 


By the use of differentials the asymptotic variances and covariances of the estimates can 
be obtained, in the usual way, in terms of the variances and covariances of the first two 
sample moments and the proportion of observations in the first class, which are 
nV(m) = flg, nV(s*) = wy—p3, nV(n,/n) = P(1—P), 
n COV (m, 8") = 3, neov(m,n,/n) = P(1— 4), 


ncov (m,,/n) = P(1 — pg + 4? — 2p). 





Thi 


co’ 


wk 


TI 





un 
vo 





W. Brass 61 








This leads after reduction to 
acs nw anette . . 
V(w) = nll —Py (yk + Py E+ 1) ++ P{—4-—3k + 2 + 4k + yk} + P(2+k)], 
V(k) = sa) (2k? + P{k — 3k? + 4nk + 5k? + yk} + P2{k? —k + 3 + 3yk}], } 


ny(1—P)? (nk+P) 
w(k+1) 
n(1—P)? (nk+P) 








cov (wk) = [2k + P{ —3k+ 2 + 4k + yk} + P2(k+7)]. 


(6) 


The asymptotic efficiency of this method of fitting will show how much information is 
lost when it is applied, in place of the maximum likelihood procedure, with large samples. 
The determinant of the variance-covariance matrix of the estimates is 


(k+1)w@ 
n*(1—P)? (nk+P)2’ 





(7) 
where 

G = 2nk? + kP{2+9— yk + 4? + 592k + y?k*} + P2{ — 2k + 6y + Tyk + yk? — 47k — 7}. 
The corresponding determinant of the maximum likelihood estimates is 


knw? 
n*(nk +P)? ((1—P)L—yP(1+inw/y)*]’ 
a” (r—1)tk! 
mor (k+r—1)!’ 


(8) 








where 


and the efficiency of the method is 
k2y3(1—P)? 
(k+1)G((1—P)L—yP(1+Inw/y)?] 
Table 1 gives this efficiency for various values of k and the mean M of the complete 
negative binomial distribution 





E= (9) - 


M = ky|w. 


For fixed k, when M tends to zero, the distribution becomes concentrated in the first 
class and the efficiency tends to 100%. For fixed M, when k tends to zero, the efficiency 
tends to zero, but so slowly that no guidance is given to the levels, for values of k which 
would be met in practice. 

For fixed M when k->00, P-> P,, = M|(e™ — 1) and the distribution tends to the truncated 
Poisson form. Then 


1 _ Py 
7 [OP | [ee aor 
For fixed k when M ->0o 
1 (r—1)! 1)! k! 
i >2k+1)E or(k+r—1)!" 








62 Fitting the truncated negative binomial distribution 


These limits give useful guidance to the efficiences for higher values of M and k and are 
shown in the last row and column of the table. 


Table 1. Percentage efficiency of estimation by Method A (modified moment) 


























eh | | 
| sy, & 05 | 1 | 2 3 4 5 10 eo) 
| mM | 
| S . 
0-5 93-4 | 97-1 | 98-9 99-3 99-4 99-5 99-3 98-6 
l | 88-0 | 936 | 97-1 98-1 98-4 98-6 98-0 97-4 
2 | 81:0 | 882 | 93-7 95-7 96-6 | 97:0 97-4 96-1 
5 | 70-9 78:8 | 86-5 | 90-4 92-7 | 94-2 | 97-2 98-5 
10 | 63-6 709 | 789 | 837 | 870 | 893 | 961 100-0 
ro) | 22-7 38:8 | 575 | 676 | 73-9 | 78-2 | 88-0 100-0 
| | | | 





Over a considerable region of values of the parameters the efficiency is high. For fixed 
M, it rises with k to a maximum value only a little below 100 %; beyond this, any fall is very 
slight. The efficiency decreases towards the lower left-hand corner of the table as 7 becomes 
closer to one and the distribution widely spread. 


Table 2. Percentage ratio of efficiency of estimation by Method A 




















to that of the moment method 
| | | | 
Mm. 1. a 4 ee 4 | : tie Pa 
| | | 
| | | 
|__|} | eee peer 
112-9 107°1 1033 | 101-9 101-1 | 1006 99-7 98-6 
117-9 | 110-5 105-1 | 102-8 101-6 100-8 98-8 97-4 
122°5 113-9 | 107-0 | 104-0 102-3. | = =101-2 98-8 96-1 
126-9 117-2 109-2 105-8 104:0 | 102-8 100-5 98-5 
128-3 | 117-6 | 1087 | 1051 103-3 | 102-2 | 1006 100-0 
100-0 100:0 | 100-0 | 100-0 100-0 100-0 | 100-0 100-0 
| | 








The determinant of the variance-covariance matrix of the moment estimates is 


W(k+1)w%y [2(1—P)—9(k+1)P] 
n*(nk+P)* [1+ P{k+(k+1)Inw/y}]? 





(10) 


Division of this by the expression (7) gives the ratio of the efficiency of estimation by 
Method A to that of the moment method, 
k2n(1— P)®[2(1 — P)—9(k +1) P] 


ei TH As ab ED 11 
Gl1+ P{k+(k+1)Inw/y}}* ae. 

This ratio is shown as a percentage in Table 2. For fixed k it tends to 100% as M goes 
to zero or infinity. For fixed M it tends to 160% when k becomes very small. When k is 
large it approaches the values in the last column of Table 1, since the efficiency of the 
moment method then tends to 100%. Except for large k, when the ratio falls below 100% 











1 are 





W. Brass 63 


by a few per cent, Method A is rather more efficient than fitting by the first two moments. 
When k becomes very small it is substantially better but, in this region, the efficiency of 
both methods is low. 


4. EFFICIENCY OF ESTIMATION OF THE MEAN OF THE COMPLETE DISTRIBUTION 


Often when distributions are fitted, the main interest is not in the overall precision of the 
method, but the efficiency of estimation of some particular parameter, i.e. of some function 
of 7 and k. When the overall efficiency is close to 100%, that for each function will also be 
near this. When the overall efficiency is low, it does not follow that this is also true for the 
function considered. The efficiencies of estimation, by Method A, of both y and k, however 
become low when 7 tends to one, the decrease being less rapid for the former. 

A parameter which will usv ally be of particular importance is the mean of the complete 
distribution, M. For example, this will give the mean breakages per susceptible cell, and 
live births per woman exposed to risk, in the problems cited in the introduction. Special 
consideration will be given to this parameter. 

The estimate of M by Method A is obtained from 


Sn, 
m(n— 7) 


es k2y? P P \4j1 424 8 
and V(mM) = w*(nk + PP ! +G—P)9kt (=) pti al . (13) 
The variance of the maximum likelihood estimate is 


k®y*{L — 9 P(1 + In w/9)*] (14) 
w*(nk +P)? [(1—P) L—P(1+inw/y)?] 


and the efficiency of the estimation of M by Method A is then 
[L—nP(1+Inw/y)*] 


P P \?(1 4 = 3)7 
[(1—P)L—9P(1+Inw/n)*}|1+ pray + (=p) PTE al si 


M =m- (12) 











1-P 


This is shown as a percentage in Table 3, for the selected values of M and k. When k>0, 
for a fixed M, the efficiency tends to zero, but very slowly. When k is greater than 0-5 the 
efficiency exceeds 90% and over a large part of the region it is within a per cent or two of 
100 %. 

Table 3. Percentage efficiency of estimation of M by Method A 


























| | 
| | | 
k 0-5 1 2 | 3 | 4 5 10 00 
|_™ | | 
nr: | cite ered 

0-5 97-0 99:0 | 997 | 998 | 99-7 99-7 99:3 98-6 

1 95-4 982 992 | 993 | 99-2 99-0 98-5 97°5 

2 93-6 97-0 | 982 | 983 | 98-2 98-0 | 97-7 97-1 

5 91-6 95-9 | 97-7 | 982 | 98-5 98-7 99-1 99-5 

10 913 | 964 | 986 | 99-2 | 995 99-7 | 99-9 100-0 

a) 100-0 | 1000 | 1000 | 100-0 100-0 100-0 | 100-0 100-0 

| | | | | 











64 Fitting the truncated negative binomial distribution 


5. Metuop B. MonpiricaTION OF MAXIMUM LIKELIHOOD EQUATIONS 


Although fitting by the first two moments and the proportion in the first class has many 
desirable properties, its efficiency for low values of w is not sufficiently high for it to be 
preferred to the maximum likelihood method, in this region, very often. It appears worth 
while then to examine how the maximum likelihood equations may be modified to simplify 
their solution on the same principles as used above. 

If w* is eliminated from the maximum likelihood equations by the use of (3) and P 
replaced by its sample value n,/n we obtain 








_ k+n,/n 
v= > 3 
k+m 
m(k+mjn)) (kt+min) LEE. ak 
k(m — n/n) (Bem) teh Ets-0 Beet (16) 


where the bars denote estimates and R is the highest value observed in the sample. & can 


then be found from the second equation and & follows immediately from the first. This will 
be called Method B. 


The equation in k can be solved by iteration in exactly the same way as the corresponding 
equation, obtained when the complete negative binomial distribution is fitted by maximum 
likelihood methods: Fisher (1953). Although this iteration takes a little time, particularly 
when the distribution has a wide spread, it is less laborious than the procedure required for 
the maximum likelihood fitting of the truncated negative binomial. 


6. Errictency or Metuop B 


The asymptotic variances and covariance of the estimates calculated by Method B are 


aF@) = ESP) Te [P{nk(1 + P) + 29P + qk} {L +9 + ln w}? + 9°k(k-+ QP) (L+7)? 
— k?(L +9) + 29*kP(L +7) in wv}, 
= k2 
+(k-+P)?(L-+)—{yk— Pn wo}, 


n cov (wk) = ee [P{yk(1 +P) + 2yP + 47k} {1 + In w/y} {L+94+Inv} 
+9Pk(k + 2P)(L+)—9(gk-—Pinw)?—9P*(L+y) Inv), 
where T = (k+P)L+yP(1+mw/y). (17) 
The determinant of the variance-covariance matrix simplifies to 


a... {+= {k+P(k+2)}L+P{yk? +k+P(k+ 2)} {1 +n w/y}? 
n*(nk +P)? T? 0 4 
2 
+ 2ykP(1 +-Inw/y) In] (18) 
and the asymptotic efficiency is 


{(k+P)L+P(1+Inw/y)} (19) 
Bi(1—P)L—yP(1+inw/y)?}’ 


where B is written for the terms in square brackets in the expression (18). 








| 


a le Ck Ok oe a a i Coe 





rth 


18) 


19) 





W. Brass 65 


Table 4. Percentage efficiency of estimation by Method B 

















‘ | | | | | 
| 
k | 0:5 ote 2 ere Ts 5 10 00 
M\ _| | | | 
| | | | 
0-5 938 | 961 | 974 | 97:8 | 98-0 98-1 98-4 98-6 
1 899 | 933 | 95:3 | 96-0 96-4 96-6 97-0 97-4 
85:1 895 | 925 | 93-6 94-2 94-6 95-4 96-1 
5 783 | 845 | 898 | 923 93-8 94-7 96-6 98-5 
10 73-6 | 815 | 906 | 946 96-6 97-7 99-4 100-0 
00 100-0 | 1000 | 100-0 | 100-0 100-0 100-0 100-0 100-0 
| 

















This efficiency is shown as a percentage in Table 4 for the selected values of M and k. 
For fixed k the efficiency of Method B tends to 100% both as M approaches zero and 
infinity. When JM is fixed and k becomes large the limit is the same as for the Method A 
fitting. When k becomes small the efficiency also becomes small, but again so slowly that 
this gives little guidance for values of practical importance. 

When kis not too small (say > 2) the efficiency of Method B remains at a reasonable level 
throughout (not less than 90° approximately). Only for the higher values of M, however, 
is the efficiency greater than that of the very much simpler, modified moment, Method A. 
At the lower values of k the efficiencies, in Table 4, are greater than those of the two moment 
methods, but not high enough to suggest that this method of fitting would often be preferred 
to the maximum likelihood procedure. 


7. Discussion 


Of the two simplified methods presented A is very much the more valuable. It gives 
estimates of the parameters with very slight labour beyond that necessary for the calcula- 
tion of the first two moments, and with an efficiency which is high in the region which 
covers many of the cases met in practice. Even outside this region the ease of the calculation 
makes it useful for exploratory work and for finding first approximations to the estimates. 
In addition the very important parameter M is estimated with an efficiency only a little 
short of 100% except when k becomes small. The second method is of limited value, but 
when M is high and k not too small (i.e. P is small), it gives estimates which are only slightly 
less efficient than maximum likelihood ones and rather easier to calculate. 

These conclusions, of course, apply strictly only when the number of observations 
becomes very large. When the object of the fitting is to determine the form of the distribution, 
moderately large sample sizes will be required to justify an analysis. It seems fair to assume 
that, in such instances, the asymptotic theory will give reasonably good approximations 
to the true sampling variances, covariances and efficiencies. The situation may be very 
different if there are good reasons for assuming that observations will be distributed in the 
truncated negative binomial form, and this is fitted to small samples to obtain estimates 
of the parameters or functions of the ps:ameters. In such circumstances, the bias in the 
estimates by all the methods of fitting mentioned above, both standard and simplified, 
may be considerable and it appears possible that the asymptotic values of the variances 


5 Biom. 45 








66 Fitting the truncated negative binomial distribution 


and covariances will not be very good approximations. It is easy to obtain the leading terms 
in n- in the biases, but the corrections to the variances are very complicated. This problem 
is not considered in the present paper. 


8. EXAMPLE 


The simplified methods of fitting are illustrated on the data below which were collected by 
the East African Medical Survey in the Kwimba district of Tanganyika territory. The 
observations are of the number of children ever born to a sample of mothers over 40 years 
of age. 





No. of children per mother 





| 
“Sy 
| 


4 7 8 | 9 | 10 | 11 | 12 | Total 











| 
} 
| 





~1 


3 18 7 


mothers 














| 
| 
| 
} 
| 
| 





| 
| | | 
| | 
| | | 
z | | 
a | 41 | 43 | 23 | 18 
| | | 


7 a7 340 





m=3-9912, s?=5-9734, n,/n=0-1441. 


Method A 


When these sample estimates of the moments and the proportion in the first class are 
substituted in equation (5) we have, 





3°9912 
w= — 14 = Ur 
= = 5rgq (1-0-1441) = 0-572, 
> 0-572x3-9912—0-1441 
a eo Ma 


Approximations to the variances and covariance of these estimates are obtained from 
equation (6) with the estimated w and k in place of the true values. This gives 


V(w) = 0-003104, V(k) = 1-3902, cov (wk) = 0-06472. 


Method B 
It is convenient to write 
—-_™ (,,%/n -" lt) 1S ai—-)¥ 
es aoapl! ae )m em +5 2+ )) oe (20) 
pe a... mn,/|n m—n,/n\ 1 & . ee 


The iterative procedure is to calculate ¢ and ¢’ for some vaiue of k; a second value of k 
which will make ¢ closer to zero is then found by the aid of ¢’, on the assumption that the 
relation between ¢ and k is roughly linear over a short range. Further improved approxi- 
mations are obtained, by linear interpolation, from the values of ¢ for the preceding two 
estimates of k. 





we 


la: 


ve 


—_ > bee 





rms 
lem 


ears 





3 are 


rom 


(20) 


(21) 


of k 
the 
oxi- 
two 





W. Brass 67 














| 

| k $ ¢’ 

| “y 

| 5-0 —0-00064 —0-00258 
| 4:75 + 0-00008 abi 
4-78 0-00000 _ 





The first approximation to k is taken as 5-0 from the estimate by Method A (normally it 
would be necessary to round this estimate to a convenient value). ¢ and ¢’ can then be 
calculated quite rapidly. The second approximation to k is 


— 0-00064 
Ea al a yf 
5-0 —-00258 4-75. 
One linear interpolation between 5-0 and 4-75 gives 4-78, for which ¢ is zero to the accuracy 
which is justified by the number of digits calculated for m and n,/n. Because of the size 
of the sampling errors the retention of further digits would only be useful with a much 
larger number of observations. 


From the value of k, @ can be found directly from the first equation in (12) giving 


w= 0-561, k= 4-78. 
By the substitution of these values in equation (17) the estimated variances and co- 
variances are obtained, 
V(w) = 0-003054, V(k) = 1-2520, cov (wk) = 0-06088, 


These estimates, compared with the solutions of the maximum likelihood equations, 
are then: 

















Method | w k 
pens, r 
A | 0-572 + 0-056 5-00 + 1-18 
B 0-561 + 0-055 4°78+1-12 
Maximum likelihood | 0-565 + 0-054 4-86 + 1-12 
| Se Se PO, LORD GOL A APSE ED A FEE 





The estimates by the various procedures differ little, and the efficiencies of Methods A 
and B are both about 95%. It should be noted that since the values of w and k used in the 
calculations are not the same for each method the efficiencies of estimation of parameters 
are not given directly by these comparisons. 


9. EXTENSIONS OF METHOD FOR SIMPLIFYING ESTIMATING EQUATIONS 


In the preceding investigation the awkward exponential type term in the estimating 

equations was eliminated by the use of the proportion of observations in the class next to 

the point of truncation. This method can be extended to negative binomial distributions 

truncated at any point in the lower or upper tail, and also, in the same conditions, to 
5-2 











68 Fitting the truncated negative binomial distribution 


Poisson and binomial (known or unknown index) distributions. For the truncated Poisson 
distribution, simplification by this procedure leads to methods of estimation which have 
been studied by Moore (1952, 1954) and Plackett (1953). The relative advantages of the 
simplifications introduced by this technique in particular cases can or'v be assessed by 
comparisons of efficiency and ease of calculation. In general, however, 1: saould be more 
useful than simplified methods of estimation based on the use of moments of a higher 
order such as those discussed by David & Johnson (1952) and Rider (1955). 


REFERENCES 


Brass, W. (1957). Models of birth distributions in human populations. Paper for the 30th session of 
the International Statistical Institute. 

Davip, F. N. & Jonnson, N. L. (1952). The truncated Poisson. Biometrics, 8, 275. 

FisHeEr, R. A. (1941). The negative binomial distribution. Ann. Eugen., Lond., 11, 182. 

FisHEr, R. A. (1953). Note on the efficient fitting of the negative binomial. Biometrics, 9, 197. 

Moore, P. G. (1952). The estimation of the Poisson parameter from a truncated distribution. Bio- 
metrika, 39, 247. 

Moore, P. G. (1954). A note on the truncated Poisson distribution. Biometrics, 10, 402. 

PLACKETT, R. L. (1953). The truncated Poisson distribution. Biometrics, 9, 485. 

River, P. R. (1955). Truncated binomial and negative binomial distributions. J. Amer. Statist. Ass. 
49, 147. 

SAMPFORD, M. R. (1955). The truncated negative binomial distribution. Biometrika, 42, 58. 





ost PFoee 


1 of 


310- 


88, 





[ 69 ] 


THE INTERPRETATION OF THE EFFECTS OF 
NON-ADDITIVITY IN THE LATIN SQUARE 


By D. R. COX 
Birkbeck College, University of London 


1. Introduction. Wilk & Kempthorne (1957) have studied the randomization theory of the 
Latin square, paying particular attention to the effects on the interpretation of the conven- 
tional analysis of variance of the absence of unit-treatment additivity, a topic first discussed 
by Neyman (1935). Wilk & Kempthorne’s paper is a report on part of an extensive investiga- 
tion of the main experimental arrangements, and while the discussion in the present note 
is concerned primarily with the Latin square, the results are in fact of general applicability. 

Some of Wilk & Kempthorne’s conclusions need to be recalled, in particular their results 
for the randomization expectations of M, and M,, the mean squares for treatments and for 
residual. Similar results apply to randomization variances and estimated variances of 
treatment contrasts. Three important conclusions are: 

(i) Suppose that there is unit-treatment additivity, i.e. that the observation obtained by 
applying a particular treatment to a particular experimental unit is the sum of a quantity 
depending on the unit plus a constant characteristic of the treatment. Then the usual analysis 
of variance is unbiased, so that in particular H(M,) > E(M,), with equality if and only if all 
treatments are equivalent. For the null case, see Fisher (1951), Welch (1938), Pitman (1938). 

(ii) Suppose that there is not additivity in the sense of (i), i.e. that the treatment effects 
vary from unit to unit. Then usually H(M,) < H(M,) when the average treatment effects are 
zero, the average treatment effects being calculated over all units used in the experiment, 
or over a finite population of units from which those used are randomly drawn. To put the 
point slightly differently, (1/,—U,)/n is not, for an n x n square, an unbiased estimate of 
the component of variation between treatments, defined in a natural way in terms of the 
average treatment effects just mentioned. 

(iii) The statistic (M,—M,)/n is, however, an unbiased estimate of a quantity &, defined 
as a certain combination of the population components of variation for treatments, 
treatments x rows, treatments x columns aid treatments x rows x columns. In fact, in 
unpublished work, Wilk has shown that randomization expectations of mean squares can, 
for many designs, be expressed simply in terms of appropriate 2’s. 

In this note we discuss the practical interpretation of the &’s and a sense in which the 
Latin square is always unbiased. 


2. A simple situation. Suppose that we have two finite populations 2,...,2j and 
Yy, -»-»Yx,- For example, the 2’s might be the heights, measured without error, of a group 
of trees at site X and the y’s the heights of a group of the same species at a different site Y. 
Consider the following three questions about the means x _, y_ of the two finite populations. 
We shall, for convenience, state the questions in the language of significance testing, 
although similar estimation problems could be considered. 

(A) Isx =y? 

(B) Do x, y. differ by more than would be expected if the x’s and the y’s were random 
samples from the same infinite population ? 








70 Effects of non-additivity in the Latin square 

(C) Do x, y, differ by more than would be expected if the 2’s and the y’s had been 
formed by selecting a random permutation of the combined set {a,, ...,%y3 Yy,---, Yu}? 

It is well known that if M and N are large, the answers to B and C are nearly identical. 

It is rather difficult to specify precisely the status of these questions. A is a simple direct 
matter with no probabilistic aspects. Except in unusual circumstances we shall find x+y. 
Questions B and C, on the other hand, are entirely hypothetical, in that we are starting 
with two finite populations and no objective sampling or permutation procedure is involved. 
However, if we regard the variation within populations as haphazard, B and C do seem 
useful questions to consider. If, according to B and C, x and y_ do not differ significantly 
at an interesting level, the data are in this respect consistent with having been generated by 
a single random process, implying that it may not be profitable to regard the observations 
as suggesting or supporting possible physical explanations of the difference or extensions of 
the difference to further individuals. 

Thus, consider the example above and suppose that site X appears to differ from site Y 
in one particular respect R and is otherwise similar to Y. If x _, y_ do not differ significantly 
according to B and C, it seems unsound to regard the data, considered alone, as supporting 
the idea that R is responsible for a difference in mean heights and as suggesting that future 
similar groups of trees, differing by R, will show a difference in mean height similar to that 
observed. On the other hand, if and y_ do differ significantly, an essentially non-statistical 
element is involved in inferring that R is a cause of the difference and that similar differences 
will be observed in the future. Yet the inference does have some cogency. The issues involved 
here are general ones arising when probabilistic methods are applied to data which have 
not been obtained by randomization and which do not belong to clearly defined random 
sequences. 

As remarked previously, the answers to B and C are nearly identical when M, N are 
large. The objection to B is that it involves reference to an infinite population which is 
an artificial construct, is not clearly defined, and which is in general certainly not a super- 
population of individuals to which one would like to apply the conclusions of the analysis. 
In view of possible confusion that this may cause, we shall work in this paper with C, 


which is mathematically clearly defined and which involves reference to no observations 
other than those actually obtained. 


3. The one-way set-up. Suppose now that we have K finite populations each of N in- 
dividuals, the members of the ith population being 2;,,...,%;,, with mean 2;. Wilk & 
Kempthorne have defined components of variation between and within populations as 


oF(x) = — X(a;,.—x,)?, (1) 


O7,(x) = KINZ LX(xj5—%;,)?, (2) 


where x_ is the mean of the x; . These are natural descriptive measures of the population 
variation: no sampling is involved. 

Now in the spirit of C of § 2, let us ask for a measure of the variation between populations 
that reduces to o}(x) when there is no variation within populations and has expectation 
zero when the whole set {x,,;} is permuted at random into k sets of N, the quantity thus 
measuring how much more variation there is between the population means that would 
be expected under random permutation. 











D. R. Cox 71 


Symmetry considerations require the use of o}(x) ---Do?,(x), where D is a constant to be 
determined. If we define o?(.), o2,(.) according to (1) and (2), for every regrouping of the 
x’s into k sets of N, and if Hp denotes expectation over all permutations of the {z;,}, it is 


easy to show that E,(o3(.)) = (1/N) Ep(o2(.)). 


Thus we need D = 1/N in order that our measure shall have zero expectation under per- 
mutation of the {x,;}. Hence we set 


p(x) = o5(x) — (1/N) o%,(2), (3) 


and call &,(x) the component of effective variation between populations. Note that &,(x) <0 
if the population means differ by less than would be expected under random permutation. 

Suppose now that a random sample of size n is drawn without replacement from each of 
the K finite populations and the usual analysis of variance made, giving mean squares 
M,, M,,. Let H denote expectation in repeated sampling. Then 


E(M,) = o%,(x) (1—n]N) + nog(x) = o7,(x) + 2X, (x). (5) 
The same formulae apply if observations are taken only from a random sample of k out of 
the K populations. Thus, if we estimate a component of variance between populations by 


the usual infinite model formula (M,— M,,)/n, we obtain an unbiased estimate of the com- 
ponent of effective variation between populations. 


4. The two-way set-up. Suppose now that we have a population set-up with R rows and 
C columns, and that the value in the ith row and jth column is 2,;. Wilk & Kempthorne 
define population components of variation for rows x columns, for rows and for columns by 


1 


Cro = (Ro) (On B21. — 24 2. (6) 
1 

Or = Ropu .-*%..) ") 
1 

P= Gay les.) m7 


where a dot denotes an average. To measure the population components of effective varia- 
tion between say rows, it is natural to construct a quantity that is unaffected by constant 
differences between columns and this means taking a combination of 0%, 7. If we further 
require that our measure has expectation zero when the separate columns of {x,;} are per- 
muted in all possible ways, we are led to the definitions 


ZR = oR —(1/C) Cres (9) 
Eo = 78—(1/R) oe, (19) 

d fi let t 
and for completeness we se Uno = Sho. (11) 


Wilk & Kempthorne give results analogous to (4) and (5) to the effect that if r rows and 
c columns are drawn randomly without replacement and the corresponding observations 
analysed, unbiased estimates of =, and XZ, may be obtained by using the ordinary infinite 
model component of variance formulae just as if both R and C were infinite. 











72 Effects of non-additivity in the Latin square 

5. The Latin square. Consider now the model for the Latin square. Let the experimental 
units be set out in the R x C array of § 4 and suppose that there is a conceptual observation 
%;;, that would be obtained if the kth treatment k = 1,...,n, were applied to the (i, j)th 
unit. Actual observations are obtained by selecting n rows randomly from the R, n columns 
randomly from the C, applying a randomized Latin square to the n? units so defined and 
recording the corresponding x’s. The definition of components of effective variation for this 
three-dimensional set is more difficult and requires less direct arguments. 

First, ordinary components of variation may be defined by formulae such as 


1 

OR = Bry U..- 7... (12) 
1 . 

OF J EAC ae GAD, gs Be , 

cat TE | aa ee re hi (13) 


etc. It seems reasonable to require that our measure of effective variation for say rows x 
treatments should be unaffected by arbitrary changes in rows, treatments, rows x columns 
and columns x treatments, i.e. in the effects not involving rows x treatments. We can ensure 
this by considering population residuals eliminating the effects just mentioned, i.e. 


Zigy( Rt) = Xj, —Vjz,—L_ jp +X 5. (14) 


A quadratic form in the quantities (14) with the requisite symmetry is a combination of 
ot and o%,,, and hence we consider measures of effective variation of the form 07, — Fo%.<), 
where F is a constant to be determined. If now the expectation is to be zero when the 
{2;;,(Rt)} are permuted completely randomly, we find F = 1/(RCn) and hence we put 


1 
Ly = OR RGn ORO (15) 


with similar definitions for the other two-factor interactions. 

Now consider the corresponding definition for the main effects, for example the main 
effect of treatments. To ensure that our measure is unaffected by those variations which 
do not involve treatments, define residuals by 


Zigelt) = (ijn—@,,,.) — (Xi. —-%,,.) (16) 


and hence consider expressions of the form 0? — Go%, — Ho? —Jo%~,, where G, H, J are to 
be determined. If the {z;;,(¢)} are permuted randomly, the expectation is 
( ; @ fF ) Lzi,,(t) 


7) RO(n=1) 7) 


RC C0 R- 
and hence we require RG +CH+RCJ = 1. (18) 
Now the quantity used by Wilk & Kempthorne has G = 1/R, H = 1/C, J = —1/(RC) and 
so satisfies (18), but we need two extra conditions in order to establish their quantity 
uniquely frem the present considerations. 
This can be done by requiring that if, say, the column classification is nugatory, so that 
Xiz, = Xj, the formulae of §4 should be recovered, i.e. that expectation zero should be 
obtained under the particular set of permutations considered there. Thus G = 1/R and a 





tal 


th 
ins 





D. R. Cox 73 


similar requirement on rows gives H = 1/C. (18) then requires J = —1/(RC). Thus we 
put 2 1 2 1 2 1 2 
X= 1 — plu Gat Pa Ret (19) 


with similar definitions for Xp, Xo. 

Wilk & Kempthorne show that the expectation of (M,—M,)/n is X, so that the conven- 
tional analysis of the Latin square gives an unbiased estimate of the component of effective 
variation between treatments in the presence of arbitrary treatment-unit interactions. 


6. Discussion. The main issue for discussion concerns whether the interpretation put 
upon >, is of sufficient interest to make the hypothesis &, = 0 of practical scientific import- 
ance comparable to or greater than that of the null hypothesis o, = 0. If X, is considered 
important, there is a sense in which the Latin square is unbiased, and in which the residual 
mean square estimates an appropriate error for treatment contrasts, whether or not there 
is unit-treatment additivity. This is the view put forward here, although there is certainly 
need for further discussion of the reasoning involved. 

This is, of course, in no way to say that treatment-unit interactions should be disregarded 
in the design and interpretation of experiments. On the contrary, if substantial variations 
in treatment effect from unit to unit do occur, one’s understanding of the experimental 
situation will be very incomplete until the basis of this variation is discovered and any 
extension of the conclusions to a general set of experimental units will be hazardous. The 
mean treatment effect, averaged over all units in the experiment, or over the finite popula- 
tion of units from which they are randomly drawn, may in such cases not be too helpful. 
Particularly if appreciable systematic treatment-unit interactions are suspected, the 
experiment should be set out so these may be detected and explained. 

But suppose that we do decide to look at average treatment effects. The situation is 
quite parallel to the comparison of two finite populations discussed in § 2. For each experi- 
mental unit there is a conceptual true difference between each pair of treatments and the 
hypothesis a, = 0 is analogous to question A in that it is concerned with whether these 
differences all average out to zero over the finite population. The hypothesis =, = 0 is 
analogous to question C (or B) in that it is concerned with whether the grouping of the 
differences is significant among a set of permutations of the effects. If we consider that C 
(or B) is often the more useful question to consider statistically in § 2, the same will be true 
in the more elaborate situation. The treatment differences averaged over the finite popula- 
tion remains always perfectly definite quantities and in some cases may be the only things 
requiring consideration, as in those rare cases in which the sole units to which it is required 
to apply the conclusions about the treatments form a finite population from which the units 
used are randomly drawn. 


This work was done at the I.M.S. Summer Institute, 1957, at Boulder, Colorado and 
support from the National Science Foundation is gratefully acknowledged. I wish also to 
thank Prof. O. Kempthorne for stimulating discussions and for helpful critical comments 


on the paper. 
REFERENCES 


Fisuer, R. A. (1951). Design of Experiments, 6th ed. Edinburgh: Oliver and Boyd. 
Neymay, J. (1935). Suppl. J.R. Statist. Soc. 2, 108. 

Pirman, E. J. (1938). Biometrika, 29, 332. 

WE cH, B. L. (1938). Biometrika, 29, 21. 

Wrk, M. B. & KrmprHorne, O. (1957). J. Amer. Statist. Ass. 52, 218. 








[ 74 ] 


QUANTAL RESPONSES TO MIXTURES OF POISONS UNDER 
CONDITIONS OF SIMPLE SIMILAR ACTION—THE 
ANALYSIS OF UNCONTROLLED DATA 


By J. R. ASHFORD 


Pneumoconiosis Field Research, National Coal Board 


This paper is concerned with the estimation of the toxicity of individual poisons which may be admin- 
istered singly or jointly to a population of living organisms under conditions of simple similar action, 
with particular reference to the analysis of uncontrolled data of the kind frequently encountered in 
studies of human populations. A brief description is siven of the methods applied in the analysis of 
quantal responses to a single poison and the extension of these techniques to mixtures of poisons is 
examined. A model to represent the action of mixtures of poisons is derived and the application of 
maximum likelihood and minimum logit x* estimation is discussed. It is shown that both procedures 
involve the use of an iterative process to calculate the estimates of the various parameters. With the 
Newton—Raphson process of successive approximation, the calculation of the minimum logit x? esti- 
mates is, in general, less complicated than the evaluation of the maximum likelihood estimates. If, 
however, the individual tolerances are assumed to follow a logistic distribution the difference is 
marginal. In view of the close agreement between the logistic function and other functions (such as 
the normal) which have been proposed to describe the tolerance distribution it is concluded that, as 
the maximum logit x? procedure offers no appreciable advantages (either theoretical or practical) over 
the maximum likelihood procedure under the assumption of a logistic distribution of tolerances, the 
latter method is to be preferred. The calculations involved are, however, likely to be too complex for 
the conventiona! methods of statistical computing, although the use of an automatic digital com- 
puter offers a practicable method of analysis. The application of the proposed procedures is illustrated 
by data relating to the prevalence of pneumoconiosis amongst coal miners at one of the collieries 
operated by the National Coal Board. 


1. INTRODUCTION 


Research in industrial medicine is frequently directed towards the assessment of the hazard 
associated with « particular process or environment. Under certain circumstances this 
hazard may lead to a permanent deterioration in the health of the men concerned and it is 
not possible to make use of the conventional experimental procedures. In addition, the 
effect of exposure may be very slow to appear and the possibility of carrying out an experi- 
ment under controlled conditions is ruled out. 

When these conditions apply, investigations must be restricted to the study of the 
relationship between past exposure and resulting medical condition amongst men who have 
already been subject to the hazard. In general, it is found that the majority of such in- 
dividuals have worked at more than one occupation during their industrial careers. Dif- 
ferent occupations commonly represent different levels of the hazard and the total exposure 
of an individual may well be made up of a number of distinct components, corresponding 
to the periods spent in the various occupations he has followed. 

In this paper I discuss the situation where the medical condition resulting from exposure 
to the hazard takes the form of a quantal response, i.e. each individual can be classified as 
having manifested a certain characteristic reaction or not. It is also assumed that the 
various levels of the hazard affect the same piysiological system and do not interact. This 
is equivalent to quantal responses to a mixture of poisons under conditions of ‘similar joint 
action without interaction’ or ‘simple similar action’, in the conventional terms of bio- 
logical assay. The characteristic feature of the problem is, however, that the experimental 











J. R. ASHFORD 75 


conditions are not subject to control by the investigator and it is thus not possible to make 
any deliberate choice of the levels of dosage. 

A comprehensive review of models for the joint action of poisons has been given by 
Plackett & Hewlett (1952), but these authors consider that, except in special cases, the 
analysis of uncontrolled data is not worthwhile. The purpose of this paper is to consider 
and compare the various alternative methods of estimating the toxicity of individual 
poisons which may be administered singly or jointly to a population of living organisms 
under conditions of simple similar action, and to derive a method of analysis which is 
applicable under the most general experimental conditions. 


2. THE ACTION OF A SINGLE POISON 


The statistical theory relating to quantal responses to single poisons is well established and 
the application of the standard techniques is familiar in many branches of biological 
research. The approach normally adopted consists essentially of the formulation of a hypo- 
thetical model to describe the action of the poison, in terms of the relationship between the 
dose and the probability that an individual organism selected at random from the population 
will manifest the characteristic response when this dose is applied. 

The ‘tolerance’ of a particular organism is defined as that dose which would be just 
sufficient to produce the response. Thus, for any dose exceeding the tolerance the subject 
would respond, whereas for any lesser dose it would not respond. The individual tolerances 
may be expected to show some variation from one organism to another, and it is therefore 
necessary to consider the distribution over the whole population. If the poison is applied 
at dose x the probability of response may be expressed in the form, 


pla) = [no dd, (1) 


where h(A) is the frequency function of the distribution of the individual tolerances. In 
practice this distribution may be markedly skew, but it is often possible, by means of a 
transformation—say f(x)—of the dose x, to obtain a tolerance distri!ution in terms of the 
transformed dose f(x) which is symmetrical in form. It is usual to assume that the para- 
meters contained in f(x) are chosen in such a way that the tolerance distribution is of some 
specified form say ¢(8), having zero mean and unit variance and, in general, covering the 
range of values of f(x) from —0o to co. Under these circumstances f(x) is termed an ‘equi- 
valent deviation’ and the probability of response at dose x may be expressed in the form 


p(w) =| $(0)d0. (2) 


Experience has shown that the equivalent deviation may commonly be represented by 
a linear function of the logarithm of the dose, of the form 


f(x) = a+blogz, (3) 


where the two parameters a and 6 characterize the response of the population to the 
particular poison applied. In general, different poisons would lead to different values of 
aand b. 

Various mathematical forms have been suggested to represent the distribution of toler- 
ances. The normal distribution was originally proposed by Fechner (1860), in connexion 
with psychometric data. The first reference to the use of this distribution in biological assay 








76 Quantal responses to mixtures of poisons 


was by Gaddum (1933) and the assumption of a normal distribution of the logarithm of the 
tolerance dose has since been made for a wide variety of data relating to quantal respons2s 
to a single poison. The use of the logistic function to represent the distribution of tolerar ce 
values has also been suggested and applications have been described by Wilson & Wor- 
cester (1943) and Berkson (1944). Other expressions such as the ‘angle’ function (Knudson 
& Curtiss, 1945) P = sin? f and the rectangular function have been considered, but for the 
most part, analyses of quantal responses to a single poison are carried out in the assumption 
of a normal or logistic distribution of tolerances. 

Under the assumption of a normal distribution of tolerance values of the equivalent 
deviation the probability of response at dose 2 is 


1 ff@ 
xr) = —— exp {— 467} d0. 4 
(0) = Tom, | exP{-409 (4 
The corresponding expression* for the logistic distribution is 


p(x) = [l+exp{—f(x)}}*. (5) 


3. THE ACTION OF MIXTURES OF POISONS 


The assumption of conditions of simple similar action implies that the poisons making up 
the mixtures have a common mechanism of action. Thus, the basic concepts of the equi- 
valent deviation and the distribution of tolerance values of the equivalent deviation both 
hold good. The methods of approach applied to the problem of quantal responses to a single 
poison may therefore be extended to cover the action of mixtures of poisons. 

If the mixture is made up of w different poisons X, Y,..., 7’ applied at dose (x, y, ...,t), 
the equivalent deviation must take the form of a function of the combined dose, say 
f(x,y, ...,t). For any particular organism it is assumed that there exists a tolerance value 
of the equivalent deviation such as that for combinations of doses leading to equal or 
greater values of f(x,y, ...,¢) the organism will respond, whereas for combinations leading 
to lesser values it will not respond. There will, in general, be a range of values of the dose 
(x,y, ...,¢) corresponding to a particular value of the equivalent deviation. 

To satisfy the conditions of simple similar action it is necessary that the equivalent 
deviation should have the following properties: 


(a) The poisons making up the mixture do not interact. This means that the relative change in the 


probability of response (and thus in the equivalent deviation) associated with a small change in the 
dose of any one poison is independent of the dose of any of the other poisons. 


@, Uy) 

We have, from (2) (2, Y, «-+5t) =| (8) dé. (6) 
The probability of response at a given dose may therefore be expressed as a function of the equivalent 
deviation. Now, without loss of generality, we may assume that the equivalent deviation may be 
expressed in the form, 

J(@¥,-.->t) = Figla, y, ..-,8)], 
where g is itself a function of the dose (x, y, ...,t). 


Hence we have P(2,¥, -..,t) = Plg(2, y, ..-5t)]. 


* This definition corresponds to that given by Berkson (1949) and differs in some respects from the 
definition of the logit suggested by Finney (1952). 





in 


(5) 


the 
the 


the 





J. R. ASHFORD 77 


Now consider the effect of making small changes (dz, dy, ..., dt) in the dose. The change in the probability 
of response is F t (”) : ("2) n (”) P 

p= .(—] dor = | — —} dr. 

, r=r\Or dg} »=z\@r 


Thus the expression (@9/@r) must be a function of r only. This relationship must hold for all values of r 


d hene 
> opieaiagga G2, Ys ---5t) = gel) +9y(y) + --- + GAe)s (7) 


where g,(7) is a function of r only and represents the contribution of the poison R. 

(b) If one or more poisons is applied at zero dose the equivalent deviation must reduce to the appro- 
priate form for a mixture of such of the constituent poisons as are applied at non-zero dose. In particular, 
if all but one of the constituent poisons are applied at zero dose the equivalent deviation must reduce 
to the appropriate form for a single poison. Thus for r = a, y, ...,t, 


9,(0) = 0, (8) 
and (0,0, ...,7,.-.,0) = a,+6, logr. (9) 
It is, therefore, necessary that f(z, y, ...,¢) should contain the 2w parameters a,, b,. 


(c) The equivalent deviation must be a monotonic increasing function of any one of the constituent 
poisons, whatever the values of the doses of the other poisons. That is to say, for any set of values of 
the combined dose an increase in the dose of any one of the constituent poisons must lead to a corre- 
sponding increase in the equivalent deviation and consequently in the probability of response. 

For tolerance distributions such as the normal and logistic, which cover the whole range (— 09, 00), 
the equivalent deviation must increase monotonically from —0o to oo as the combined dose increases 
from zero for all poisons to oo for any one poison. 

(d) If the equivalent deviation contains logarithms then it must remain invariant under a change of 
the base of the logarithms. Thus, if the equivalent deviation contains the parameters a, and 6b, [where 
a, and b, characterize the action of the poison R when applied singly] and the base of logarithms is 
changed, the corresponding parameters a é b, must also characterize the action of the poison R under the 
new base of logarithms. 

If the logarithms are taken to base h, the equivalent deviation for the poison R when applied singly 
7 — wh Slr) = a, Ss b, logy, r. 


If the base of logarithms is changed to k we have 

Fir) = a, + (6, log, k) log, r. 
Hence, under a change of base of logarithms from h to k 
=a, and 6, = (log,k)b,. 


’ 
a, 


If the equivalent deviation contains parameters a,, b, (r = x,y,...,t) and 0,,,,...,0,, and the log- 
arithms are taken to base h or base k, 


S rhs Yo «--5b5 Ops Dp O45 Oa «+05 Om) =P ils Yo «++9t5 Ap(bplogy k); 04, O9, «++, 7). (10) 


(e) Any parameter contained in the equivalent deviation must possess a range of values leading to 
a real value of this function whatever the combined dose. 


It has been shown that the equivalent deviation for a single poison R may be expressed 


in the form fir) = a, +0, log,r 


c= log, [h(artbr log, n]. 


Taken in conjunction with condition (a) this suggests that the equivalent deviation for 
a mixture of poisons is of the form 


t 
I(x, ¥,...,t)= logs > pers oen ce (11) 


r=2Zz 








78 Quantal responses to mixtures of poisons 


It may be seen by inspection that expression (11) satisfies conditions (a), (b), (c) and (e). 
When, however, the base of logarithms is changed from h to k 


t log, k 
Sil; Y, eos t) has log, D> Kariogn dhe ee ’ 
Tr=2r 


condition (d) above is not satisfied and it is necessary to introduce an additional parameter 
@ and to consider an equivalent deviation of the form: 


t 
fila, Y, --.5t) = log, | Y har+br town ne] ; (12) 


P= 


If the base of logarithms is changed from h to k we have 


t 
fal ts nt) = 6'| Ss eeetivoerne, 
T=2r 


where 0’ = @log, k. 

An equivalent deviation of the form (12) thus satisfies all the necessary conditions. 
However, the process by which this expression was derived does not (and cannot reasonably 
be expected to) lead to a unique solution and it may well be that there are other functions 
which satisfy the requirements of the situation. In the absence of any detailed information 
about the exact mode of action of the mixture of poisons on the physiological system of the 
subjects concerned, it is considered that preference should be given to the expression which 
is least complicated mathematically and which involves the introduction of the minimum 
number of parameters, provided the experimental evidence throws no doubt on the validity 
of the assumption. The expression (12) is considered to represent the simplest non-trivial 
mathematical form consistent with the conditions given above and includes only one 
parameter in addition to the 2w parameters required to describe the action of the w poisons 
when applied individually. Furthermore, the examination of a considerable body of data 
confirms that this expression does provide an adequate representation of the action of 
mixtures of poisons under conditions of simple similar action. The assumption of the form 
(12) for the equivalent deviation is thus considered to be justified. An alternative derivation, 
based on certain hypotheses concerning the transfer of the poisons from the site of dosage 
to the site of action, has been given by Plackett & Hewlett (1952). 

On writing , 
a,+b,log,r=1, S1=L 

rT=2r 


(12) may be expressed in the form 
L t iy 
f(x,y, ...t) == + Blog| & exp (“5 ) 
w a : 
L 


(wl, —L)? 
+ Olog, w+0| ra . 
Now if the contribution from each of the poisons is close to the mean the last term is negli- 
gible and we have 





Ww 


1 
f(x,y, ovagitl = pet log, w 


t 
= bh > (a, + 6, log, r) + A log, w. (13) 











ter 


12) 





J. R. ASHFORD 79 


Thus, if the component doses are such that the terms (a, + 6, log, r) are close to the average, 
the equivalent deviation may be expressed as a linear function of the (2w +1) parameters. 
Under these conditions constant values of the equivalent deviation, and thus constant 
values of the probability of response, are given by loci of the form 


t 
> 6, log, 7 = constant, (14) 


r=2 
t 
i.e. I] 7 = constant. (15) 
r=2r 
In the special case where the response lines for all the constituent poisons when applied 
singly are parallel, (i.e. for b, = 6) the expression (14) reduces to the form 


t 
> log, r= constant. 
r=2 


Thus, for mixtures made up of poisons which would be suitable for comparative assay, the 
probability of response is approximately constant for combined doses whose components 
are such that the sum of their logarithms is constant. 


4. ESTIMATION OF PARAMETERS 


The estimation of the two parameters associated with quantal responses to a single poison 
has been widely discussed and a number of alternative procedures have been proposed. 
The method of maximum likelihood was first applied to quantal response data by Bliss 
(1935) and has since been adopted in many branches of biological assay. This method, which 
has been described in detail by Finney (1952), involves an iterative process of successive 
approximation to the final result and the necessary calculations are generally rather 
cumbersome. A procedure based on the minimization of the heterogeneity x? has also been 
suggested. For any postulated form of tolerance distribution this involves an iterative 
process similar to the maximum likelihood solution and the method does not appear to 
have been applied in practice on any considerable scale. 

A modified form of the minimum y? procedure, which has been termed the ‘minimum 
logit y2’ method, has been proposed by Berkson (1944). This method is based on the mini- 
mization of an approximate expression for the heterogeneity x? and leads, in association 
with the assumption of a logistic distribution of tolerances, to a direct solution for the two 
parameters. In comparison with the maximum likelihood or minimum x? methods the 
minimum logit x? procedure permits an appreciable reduction in the effort required to 
compute the parameters in any particular case. 

In view of the basic similarity between the action of single poisons and that of mixtures 
of poisons under conditions of simple similar action, consideration has been given to the 
application of both maximum likelihood and minimum logit y? procedures. It is assumed 
that the test subjects are assigned at random to k groups and that the ith group includes 
n; subjects, of which 7; manifest the characteristic response when the group is exposed to 
a mixture of w poisons X, Y,..., 7’ applied at dose (2;, y;, ...,t;). 


(a) Maximum likelihood method 
The probability of r; responses in the ith group is given, by the binomial distribution, as 


Por) =(f4) Pragnes, (16) 
i 








80 Quantal responses to mixtures of poisons 


where the theoretical probability of response 
S (Xj, Vi; » 


P, = ® 4(0) a0 


—o 


and Q, = 1-B. 


Thus, the logarithm of the likelihood of any given set of observations may be written as 
k 
L = constant + > [r;log P, + (n;—1;) log Q,]. (17) 
i=1 


The maximum likelihood estimates of the (2w + 1) parameters u( = Oora,,b,[r = x,y, ...,t]) 


contained in f(x, y, ...,¢) may be calculated directly or by the solution of (2w + 1) equations 
of the form aL bm n(pi—P) (2B) _ a 
i=1 P. @, \@ i ) 


where p; = 7,/n; is the observed proportion responding. 
The asymptotic variances and covariances of the maximum likelihood estimates may be 
evaluated in the usual way and result in 


covte.on=[-Flavae|] ~ [2 zbcleel lve) 0 


The ‘fit’ of the observations may be tested by calculating the value of the heterogeneity y*. 











(6) Minimum logit y? method 
The heterogeneity of the set of observations given above may be expressed in the form 
k 


=D po, (PB. (20) 





Unlike the method of maximum likelihood, which may be applied to any form of tolerance 
distribution, the minimum logit x? procedure depends basically on the assumption of a 
logistic distribution of tolerances. Under these circumstances we have, from (5), 

P, = (l+exp(—f(%,y;, ---.t))17, 
aP, exp (—f;) 
df, {l+exp(—f)2 


Now if f, is the logit corresponding to the observed proportion p; we have, for | p;—P,| and 


= P,Q;. (21) 





|fi-f;| small, 
= (4-001 (22) 
where P’, is some value between p, and P,. Thus from (21) 
P= (ff) Pi 
Hence ne 79, PQ fifo 
Now Pie PPI) (23) 


and i +3 nD dl fi —f,)*. (24) 





nd 


2) 


4) 





J. R. ASHFORD 81 


The method proposed by Berkson is based on the minimization of the expression (24), 
which does not involve the theoretical probabilities of response P;. This may be carried out 
directly or by the solution of (2w +1) equations of the form 


Ov2 k a of. 
~ = —2E mipiadhi -10 (34 = 0. (25) 


In the case of quantal responses to a single poison the equivalent deviation may be 
expressed as a linear function of the parameters to be estimated and the equations (25) 
reduce to a pair of simultaneous linear equations. When a mixture of poisons is considered, 
however, no direct solution is possible and it is necessary to employ an iterative procedure 
of successive approximation to the final result, as for the maximum likelihood solution. 

The minimum logit x? estimates are Regular Best Asymptotic Normal (Taylor, 1953) 
and the asymptotic variance-covariance matrix is similar to that for the maximum likelihood 
estimates. 

From (19) and (21) we have 


k 5 
feov (a, 6)] =| Sm. R.a,|Z 





| oo io 
Bos)” (26) 

An approximate test for the ‘fit’ of the observed data may be obtained by calculating 
the value of x? at the minimum by means of expression (24). 


5. COMPUTING PROCEDURE 


Under conditions of simple similar action the solution of equations (18) or (25) must involve 
a process of iteration. At least five variables (corresponding to a mixture of two poisons) 
must be taken into account, and the calculations necessary to obtain estimates of a suffi- 
ciently high degree of accuracy are likely to be too complex for the normal techniques of 
statistical computing. The use of an automatic digital calculator does, however, offer the 
possibility of the application of either method of estimation when it is necessary to analyse 
a large number of sets of data. 

Whatever the iterative procedure employed, the necessary computations must take the 
same basic form. For any given set of data the calculation of initial approximations for the 
parameters must be followed by the repeated application of a procedure of successive 
approximation. The choice of a particular process must depend both on the effort required 
to complete each cycle of iteration and also on the number of cycles necessary to achieve 
the desired accuracy. 

Consideration has been given to three possible methods of iteration—the systematic 
variation of each parameter in turn, a variant of the ‘steepest descent’ method, and the 
Newton—Raphson method. The systematic variation of the parameters involves the cal- 
culation of the extremum of the likelihood or logit x? function with respect to each of the 
parameters in turn, the values of other parameters being held constant. A considerable 
effort is therefore required at each cycle of approximation. The variant of the ‘steepest 
descent’ method is based on the variation of all the parameters simultaneously along the 
normal to the likelihood or logit y? function in the ‘parameter-space’, the extremum being 
determined by reference to the quadratic approximation passing through three specified 
points on the normal. The choice of these three points depends only in the values of the 
function and its first partial derivatives with respect to the parameters, and the computing 

6 Biom. 45 





82 Quantal responses to mixtures of poisons 


of each cycle of iteration is considerably less complicated than that required for the syste- 
matic variation of the parameters. A major disadvantage is, however, the comparatively 
slow rate of convergence, which is associated with a tendency for successive approximations 
to oscillate about the final result. The Newton—Raphson method, whichisnormally employed 
to obtain the maximum likelihood estimates in the case of quantal responses to a single 
poison, has therefore been preferred. Although this method involves the calculation of the 
second derivatives of the function at each cycle of iteration, the convergence of successive 
estimates to final result is sufficiently rapid to offset this disadvantage. 

The Newton—Raphson process is based on the n-variable form of Taylor’s theorem. For 


any given approximation d, the increments & for the next cycle are given by (2w+1) 
simultaneous linear equations of the form 


OF(d,) . 2+! ca} =0 
Ou fA | Ouoev fs’ 


where F is the likelihood or logit y?-function. Thus, each cycle of iteration involves the 
calculation of (2w-+ 1) first derivatives and (2w+ 1) (w+ 1) second derivatives of F. 


The values of 0L/0u for the maximum likelihood procedure are given in equation (18). 
That for 0?L/dudv is 


2 Pha, lenol*™(-Z9,*"he, (e;~2) | ee lal: 


For the minimum logit y? procedure, we have, from (25) 








(27) 











5A. = 7 > Pid al ~fi) (as At (2 (2 “9 
and ees =2 En 1: Pid: (a (2. (29) 


In general, the first and second derivatives of the likelihood function are considerably 
more complicated than the corresponding derivatives of the logit y*-function, even if the 
expected values of the second derivatives are substituted for the observed values [as is 
commonly done in the analysis of quantal responses to a single poison]. 

The assumption of a logistic distribution of tolerances (which is implicit in the minimum 
logit x? procedure) permits an appreciable simplification in the expressions for the partial 
derivatives of the likelihood function. Under these circumstances 


aP,\ (ef;\ of: 
3a ~ (al (ou) = (5 





lau} \ dul" 
Hence u = n,(p;—P;) (3 ; (30) 
fade 7 | (P.—P) [ett | —P.@, LA BN), (31) 
and a= és - En, PQ, |e. (32) 





Although the expressions (30), (31) and (32) associated with the maximum likelihood 
procedure are slightly more complex than the corresponding expressions (25), (28) and (29) 





for the 
respons 
practic 
minim 
methoc 
of pois 
Botl 
deriva 


Fro’ 


Th 








Syste- 
tively 
ations 
loyed 
single 
of the 
eSSive 


. For 
+1) 

(27) 
s the 


(18). 


(28) 


(29) 


ubly 
the 
iS is 


um 
tial 





J. R. ASHFORD 83 


for the minimum logit x? procedure, in that they involve the ‘theoretical’ probabilities of 
response P; rather than the equivalent deviations f;, the difference is marginal. Thus, for 
practical purposes, the computations necessary to obtain the maximum likelihood and 
minimum logit y* estimates are comparable, and the relative advantage which the latter 
method has in the case of quantal responses to a single poison does not hold for mixtures 
of poisons. 

Both methods of estimation involve the calculation of the first and second partial 
derivatives of the equivalent deviation, which may be obtained as follows: 


t 
From (12) fla, y, ...,t) = O06, ¥ parton 


t 
= Olog,, eln| > exp (“Fe 2") ‘ 
r=2z h 


Thus, for / = 1,2,...,¢ 


oF . a, +b, log! a, + b, log r 
20,7 ?(“stoge IL? (“atose | 


= fllarthylog—-90), (33) 
lb, of \] | af 
da? A log, e [!- eal (aay) wi 


Similarly, the remainder of the first and second partial differential coefficients may be 
generated by a series of relatively simple arithmetic operations, and the substitution of 
the expected values of the second derivatives for the cbserved values would lead to a rela- 
tively small reduction in the calculation of the Newton—Raphson equations. The use of 
this approximation tends to decrease the speed of convergence of the iterative process and, 
on balance, it is considered that the observed values should be employed. 

The speed of convergence of the procedure will depend on the choice of initial approxima- 
tion a®, 6° and 9°. Asa general rule it is possible to obtain values for a? and b? by carrying out 
an analysis on the marginal distribution of r. In view of the relative simplicity of the 
calculations under the minimum logit y* procedure the use of this method is to be preferred, 
whether or not the final method of estimation is to be maximum likelihood or minimum 
logit y?. 

The calculation of 9°, given initial approximations for a, and b, may be carried out as 
follows: 


From (25) :  |n iPiatt:—fi) (34) = 0 


at the minimum of the y? function. Summing over all values of u = a, (r = x, y, ...,t) we have 


 [n Pili -19% [2 f || =0. 


i=1 


of _ 
From (33) = (4 A 
k 
Hence npiddti—fi) = 9. (35) 
i=1 
t 
Further, from (13) I* ? 2 (4, +b, log r) + @ log w. 


6-2 








84 Quantal responses to mixtures of poisons 
Thus, a first approximation for 9 may be obtained by means of the expression 

k hate ic 

x ivi f ——- (a? + bp log n| 

= a 

oo = oat .. (36) 
log wD 1; P;4; 
i=1 





This process is based on the initial estimates a® and 6° and involves two further sta ~_ sof 
approximation. The estimate of 6 may therefore be subject to considerable error, jar- 
ticularly if a number of the (a,+6,logr) differ markedly, and it may well be considered 
preferable to base the initial value on previous experience. 


6. CONCLUSIONS 


In theory, the assumed form of tolerance distribution should conform exactly to the 
true relationship between the equivalent deviation and the probability of response. In 
practice, however, this true relationship is not known and the choice of a particular dis- 
tribution must be based on other considerations. Examination of the published work on 
quantal responses to a single poison shows that the method of probit analysis is generally 
preferred. The assumption of a normal distribution of tolerances has received wide support, 
mainly on the grounds that the distribution probably provides a realistic description of 
the fundamental processes which take place when a poison is applied. However, the 
logistic distribution agrees very closely with the normal distribution for response rates of 
between 0-01 and 0-99, which covers the whole of the working range of responses, and it is 
unlikely that sufficient information would ever be available to discriminate between the 
two. Even if the true distribution of tolerances were, in fact, normal, any conclusions based 
on the assumption of a logistic distribution would be unlikely to be seriously affected. In 
the absence of any strong theoretical evidence it is therefore considered that the logistic 
distribution is to be preferred, in view of the fact that the calculations involved in the 
estimation of the parameters are less complicated than those associated with the normal 
distribution. Under this assumption both maximum likelihood and minimum logit ’ 
estimation may be employed. 

The relative merits of the two methods of estimation have formed the subject of con- 
siderable controversy. The minimum logit y? estimates belong to the class of Regular Best 
Asymptotic Normal estimates, in the sense of Neyman (1949) and thus have the same 
asymptotic properties as the maximum likelihood estimates. In the case of quantal re- 
sponses to a single poison the behaviour of the two methods when applied to ‘small’ 
samples (of the sizes that are normally employed in biological assay) has been considered. 
It has been shown by Miller (1950), Berkson (1952) and Armitage & Allen (1950) that corre- 
sponding estimates agree extremely closely for a wide variety of experimental data, provided 
the true distribution of tolerances does not show any marked departure from the logistic 
distribution. Where there are any appreciable differences between the results obtained 
these may be attributed to the procedures adopted for handling response rates of zero or 
100%, for which the minimum logit y? estimates become indeterminate. 

Doubts have been expressed (Silverstone, 1956) about the validity of the indiscriminate 
application of the 1/{2n)-rule proposed by Berkson (1955) to overcome this difficulty. It 
has been shown that the estimates obtained by the application of this rule may not be 
statistically sufficient and that the justification of this property given by Berkson is open 





pen 





J. R. ASHFORD 85 


to question. A further objection is the tendency for the value of x? to become unstable 
when the numbers of individuals at any particular dose is small. The difficulty of handling 
response rates of zero and 100% may be partially overcome by exercising reasonable 
discretion in the use of the 1/(2n)-rule, but the instability of the values of x? is a more serious 
disadvantage. In circumstances where the experimental conditions are not subject to 
control by the investigator small numbers of individuals may occur at any of the levels of 
dosage. The difficulty may be offset by taking together the groups with similar levels of 
dosage, but this process must inevitably lead to some loss of information and the need to 
pool groups which contain a small number of subjects is thus a definite objection. On general 
grounds the minimum logit x? procedure has no intrinsic advantages over the maximum 
likelihood procedure as a method of estimation and, as the calculations involved under the 
assumption of a logistic distribution of tolerances are comparable, it is considered that the 
method of maximum likelihood is to be preferred. 

The reasons for selecting the logistic function to represent the tolerance distribution 
and the maximum likelihood method of estimation apply whatever the assumed form of the 
equivalent deviation, provided this is not a linear function of the parameters to be estimated. 
Thus, the advantages of the method of approach hold good under general conditions of 
similar joint action, whether or not the poisons interact. Indeed, the use of an automatic 
digital calculator offers the possibility of analysing data of this type, even though the 
model for the equivalent deviation may be extremely complicated. 


7. EXAMPLE 


The application of the procedures described above may be illustrated by an example from 
the field of research into the causes of pneumoconiosis amongst coal miners. Experience 
has shown that there are, in general, three distinct levels of dust concentration, differing 
both in average value and variability. They are associated with work on the coal-face, on the 
coal-getting and preparation shifts, and elsewhere underground, respectively. The chance 
of having pneumoconiosis is a function of the periods spent in these three types of environ- 
ment. Thus, if ‘response’ is regarded as having pneumoconiosis and ‘dose’ is measured by 
the periods spent on the coal-face, on the coal-getting and preparation shifts, and elsewhere 
underground, the situation is equivalent to quantal responses to a mixture of three poisons 
under conditions of simple similar action. 

The data obtained at one particular colliery are illustrated in Table 1, which shows that 
the majority of the men had spent an appreciable period of time in more than one class of 
environment. To reduce the numbers of groups under consideration the results for men with 
a similar level of exposure have been pooled and the values for the periods spent in the 
three types of environment are weighted averages. 

The initial estimates of a, and b,, were obtained by carrying out a minimum logit x? 
analysis for x only, for the groups which cover men with a comparatively short period of 
exposure in the other environments. Similar analyses were performed to obtain the initial 
estimates of a,, b,, a, and b,. The calculated values are 


ai =-—6-9, a® =—-60, af =—5-2, 
b= 5:2, B= 40, B= 2-9, 
The estimate of 0 obtained by means of equation (36) is 
@ = 7:5. 








86 


Table 1. Exposure to dust and incidence of pneumoconiosis for groups of mine workers 


Quantal responses to mixtures of poisons 





Period spent (years) 





. Coal-face, 
coal- 
getting 
(x) 


0-25 
0-25 
0-25 
0-25 
0-25 








Coal-face, 
pre- 
paration 


(y) 





Elsewhere 
under- 
ground 


(z) 


0-25 
1-74 
5°15 
10-65 
24-31 


37°44 
0-25 
3-60 
0-25 
0-25 


2-04 
6-11 
44-25 
32-62 
23-00 


20-71 
1-25 
2-73 

13-94 
1-43 


19-00 
23-25 
14-71 
5:38 
0-25 


4-53 
37-91 
13-21 
22-75 

0-25 


10-67 
0-80 
9-21 

26-75 
0-25 


21-23 
0-25 
35-44 
3°33 
15-67 


3°44 
0-25 
9-29 
17-88 
24-09 


2-00 
0-25 
2-50 
10-20 


| 








No. of 
men 


(n) 


. 
| Proportion |_ 


with 
| pneumo- 
coniosis 


(p) 











Period spent (years) 


Coal-face, 
coal- 
getting 
(x) 





| 





Coal-face, | Elsewhere 
pre- 
paration 
(y) 








under- 
ground 


(z) 


6-00 
32-55 
2-62 
20-70 
11-36 


0-25 
0-25 
7-75 
7-40 
0-25 


27-50 
0-25 
16-88 
5-93 
0-25 


1-68 
2-83 
16-00 
3-67 
2-93 


0-25 
9-45 
20-83 
6-64 
1-91 


0-25 
18-17 
0-25 
0-25 
16-00 


2-80 
1-67 
8-50 
1-67 
5-73 


0-25 
12-33 
0-25 
2-36 
6-83 


7-67 
9-57 
1-91 
0-25 
0-25 


3-14 
8-38 
2°88 
0-25 





10 w+] 


——_— 

| 

| Proportion 
with 

pneumo- 

coniosis 


(p) 











Sti 
the n 
of ex 
meal 
orde 
was: 
of al 
large 
effec 
of tl 








rs 


 roportion 
with 
Pneumo. 
Coniosig 
(p) 





J. R. ASHFORD 87 


Starting from these initial values, the Newton—Raphson procedure was applied to obtain 
the maximum likelihood estimates based on the logistic distribution of tolerances, by means 
of expressions (27), (30) and (31), and the variance covariance matrix was determined by 
means of expressions (19) and (32). The iterative process was discontinued when the first- 
order partial derivatives of the likelihood function with respect to each of the parameters 
was zero to five decimal places. The pattern of the computations was typical of the majority 
of analyses of this type, in that the initial and final cycles of iteration produced relatively 
large changes in the parameter values, whereas the intermediate cycles had a much smaller 
effect. The convergence of the final iterations was particularly rapid. The final estimates 
of the parameters were as follows: 


@,=—6-77, @,=—9-66, @, = —4-55, 
6,= 509, 6,= 614, 6,= 2-30, 
6 = 3-35. 


It will be seen that the initial approximations were most accurate for a, and b,, but that the 
final value for @ differed considerably from the initial estimate. 
The variances of the estimates are 
vara, = 0-679, vara, = 14-335, vara, = 0-229, 
varb, = 0-346, varb, = 7-999, varb, = 0-127, 
var 8 = 0-627. 


The covariances of corresponding values of @,, 6, are 
cov (a,,6,) = —0-480, cov(a,,b,) = —10°537, cov (a,,b,) = — 0-152. 
It thus appears that the estimates relating to y (the period spent on the coal-face, pre- 
paration shifts) are known with the least precision. On general grounds this is not un- 
expected, in view of the variability of the dust concentrations associated with this environ- 


ment. 

The hazard relating to a particular environment may conveniently be expressed in terms 
of the ED 50—the ‘dose’ at which 50 % of the population may be expected to manifest the 
characteristic response if the men concerned were exposed only to the given environment. 
If this quantity is denoted by r;. and R;) = logr;9, we have, from (3) and (5) 


The 95 % fiducial limits for the ED 50’s are 


S ‘ 51 {24 years, 
X59 = 1:33+0-05, ie. X55 = 21 ny 
a ‘ 79 years 
Y 59 = 1-57+0-32, i.e. = 37 | : 

50 + 0°32, 10. Yo5o [18 years, 


{ 193 years, 


= 1 ‘31, ile. = 95 | 
and Zoy = 1984031, ie. 259 | 47 years. 


The value of y* relating to the final estimates of the parameters is 111-9, which corre- 
sponds to a significance level of 7°. Only 9 of the 98 components of x? relating to the 
individual groups are in excess of 3-0. There is thus no evidence that the observed data 
deviate significantly from the hypothetical model on which the analysis is based. 


This paper is published by permission of the National Coal Board. 








88 Quantal responses to mixtures of poisons 


REFERENCES 


ARMITAGE, P. & ALLEN, I. (1950). Methods of estimating the LD 50 in quantal response data. J. Hyg., 
Camb. 48, 298-322. 

BERKSON, J. (1944). Application of the logistic function to bio-assay. J. Amer. Statist. Ass. 39, 357-65. 

Berkson, J. (1949). Minimum y*? and maximum likelihood solutions in terms of a linear transform, 
with particular reference to bio-assay. J. Amer. Statist. Ass. 44, 273-78. 

Berkson, J. (1952). Relative precision of minimum chi-square and maximum likelihood estimates of 
regression co-efficients. Ann. Math. Statist. 23, 148. 

Berkson, J. (1955). Maximum likelihood and minimum 4? estimates of the logistic function. J. Amer. 
Statist. Ass. 50, 130-62. 

Buss, C. J. (1935). The calculation of the dosage-mortality curve. Ann. Appl. Biol. 22, 307-330. 

Cramer, H. (1946), Methods of Mathematical Statistics. Princeton University Press. 

FECHNER, G. T. (1860). Elemente der Psychophysik. Leipzig: Breikopf and Hartel. 

Foyney, D. J. (1952). Statistical Method in Biological Assay. London: Charles Griffin and Co. 

GappvM, J. H. (1933). Reports on biological standards. III. Methods of biological assay depending 
on a quantal response. M.R.C. Special Report, Series, no. 183. 

Knupson, L. F. & Curtiss, J. M. (1945). The use of the angular transformation in biological assay 
J. Amer. Statist. Ass. 42, 282-96. 

MILteER, L. C. (1950). Biological assays involving quantal responses. Ann. N.Y. Acad. Sci. 52, 903-19. 

NeEyMaN, J. (1949). Contribution to the theory of the x? test. Proceedings of the First Berkeley Sym- 
posium on Mathematical Statistics and Probability. University of California Press. 

Puiackett, R. L. & Hewett, P. 8. (1952). Quantal responses to mixtures of poisons. J.R. Statist. 
Soc. B, 14, 141-63. 

SILVERSTONE, M. (1956). Private communication. 

Taytor, W. F. (1953). Distance functions and best asymptotic normal estimates of regression 
co-efficients. Ann. Math. Statist. 24, 85-92. 

Witson, E. B. & WorcrsTER, J. (1943). Bioassay for a general curve. Proc. Nat. Acad. Sci.. Wash., 
29, 79-85. 








mm @2 @® ww ©@® 








[ 89 ] 


SOME PROPERTIES OF RUNS IN QUALITY 
CONTROL PROCEDURES 


By P. G. MOORE 
University College London 


1. The usual type of control chart for average values is based on the means of successive 
samples of some fixed size, n. The limits commonly placed on the chart correspond to pro- 
babilities such as 0-998 or 0-99 that the sample mean falls inside the limits when the system 
is operating satisfactorily. If the system goes out of control due to the average leve! altering, 
the two extreme situations that may have occurred are: 

(i) The average value of the population of items being manufactured may change slightly 
due to a tool wear, or a fresh batch of raw material, or a slight variaticn in the voltage of 
the power supply and so forth. 

(ii) A large change in the average value takes place due to something going wildly out of 
control. This happens when mistakes are made in the manufacturing process, such as bolts 
being left undone in some machine or wrong methods being used by one of the operators. 

There are other faults that could occur which would affect the standard deviation of the 
manufactured items but we shall assume here that the standard deviation remains constant 
throughout. 

Usually the economics behind any inspection scheme limits the amount of sampling that 
can be done and it is necessary to decide whether to take, on the one hand, small samples 
fairly frequently or, on the other hand, large samples rather more infrequently. A large 
sample has more chance of picking out a change of type (i) whereas changes of type (ii) would 
be easily picked out by both large and small samples. As small samples are taken more 
frequently changes of type (ii) would be detected quicker, on average, by such samples. 


2. Since small samples have desirable properties for type (ii) changes it is worth seeing 
whether their performance for type (i) changes cannot be improved. One method of approach 
is to consider the sample means in bunches rather than in isolation. Mosteller (1941) con- 
sidered the number of runs of means that were above or below the median, and Olmstead 
(1946) used the number of runs up and runs down as his basis for a test. Weiler (1953) 
suggests that we should stop production when a specified number, 7’, means in succession 
fall over the control limits set up for the scheme. This latter rule is the one that we follow 
here. There are other possible rules based on using two limits, warning and action, in con- 
junction with runs. Dudding & Jennett (1944) gave a brief discussion of the possibilities 
and Page (1955) gave some tables for such schemes. Some comparisons with this type of 
scheme are made in § 6 below. 


3. Iftherule adopted isto stop production when 7 successive means fall beyonda specified 
control limit, the appropriate position for this limit (or pair of limits) will vary with 7’. 
There are several ways in which the effects of schemes with different values of 7’ can be 
equated in order that the schemes can be regarded as equivalent when the population 
average is remaining constant. The method adopted here is to make the average number of 
samples drawn before a stoppage occurs the same whatever the value of 7' that is being used. 





90 Some properties of runs in quality control procedures 


If p is the probability that one single mean value falls beyond the control limit, then the 
average number of means that must be observed before a run of 7' successive means fall 
outside the limit is (1—pT)/(1—») 97. (1) 


This result is derived, for example, in Feller (1950, p. 266). If 7’ is equal to unity, (1) reduces 
to 1/p. Having chosen this value of p in the customary coatrol chart manner, we now choose 
for each value of 7' a value, p,,, such that 


(l1—p})/(l—pp) pp = 1/p. (2) 


Let the mean and standard deviation of the controlled population be £ and a, respectively. 
Then if the upper control limit for the mean of a sample of nis put at + Ao},/n, where A is 
chosen to make the probability of falling beyond the limit equal to p, we would have to find 
a new value A, such that pz is the probability of one mean falling beyond +A a/,/n. 
Equation (2) can be solved to give the values of p,, that correspond to p for various values 
of T and hence Ais found. In the next two sections we consider in particular 7’ equal to two, 
three and four. Longer sequences than four would not in general be acceptable, because 
such sequences impose quite a delay before a change in mean can be detected. The values 
of p, and A, obtained by this method for the two sets of limits considered in the next section 
are: 





Average number of samples examined before production stopped unnecessarily, 
| i.e. when no change in population mean 
| 














| 1000 200 

| 

| | | | 
be 1 2 3 4 1 2 3 4 
pr | 0-001 | 0-0321 | 0:1037 0-1873 0-005 0-0732 0-1825 0-2891 
Ar | 3-090 | 1-850 | 1-261 | 0-888 2-576 1-452 | 0-906 0-556 











4. Toexamine the utility of this procedure we will now investigate the situation when the 
population mean does change from its nominal or assumed value. Two possible approaches 
are to consider (a) the average number of samples required before a stoppage occurs or, 
(b) the probability that, after m samples have been drawn from the population with the 
changed mean, at least one run of 7’ means has occurred outside the control limit. The first 
approach gives an overall figure, whilst the second, considered as a function of n, gives a 
more complete picture as to the situation after any given number of samples have been 
drawn. 

For (a) a revised value of p7 can be calculated after any given shift has occurred in the 
mean value, and equation (1) can then be used to find the average run before 7’ successive 
means fall beyond the control limit. 

For (5) it is simplest to compute the probability that no run of length 7' has occurred in 
m trials. Bateman (1948) developed a combinatorial method for finding the probability of 
the longest run being of length 7’. With some fairly straightforward modifications this can 
be developed into a method for giving the probabilities that no run of at least 7’ has taken 
place. The actual enumeration, which Bateman gives for certain values of m between 10 











I 
is 


~ms a@ aoe &@S dd 











P. G. Moore 91 


and 20, is fairly tedious and for large m it is essential to seek some other method. Feller 
(1950, chapter 13) develops a method for obtaining the probability that in m trials the first 
run of length 7' occurs at the mth trial. He gives this probability, f,,, as 


(w—1)(1—px) 1 


In~ (PY Ta) (1—pyer - 





where jis the probability of one individual being beyond the control limit and xis the unique 
root of (l—p)s(lt+ys+...+u7—sT-) = 1. (4) 
Hence, our required probability of no runs of length 7’ is the sum of (3) for m>m-+1 and 
this gives the expression q,,, where 

(1—pax) 1 


Im™~ (P41 — px) ar (5) 


Some experiments were made to test the accuracy of this approximation for m = 10 and 
for runs of two, three and four. As Feller himself has demonstrated the accuracy when y = }, 
attention was concentrated on the cases when <4. The exact values were obtained by 
enumeration of the possible runs and the approximate results obtained from (5). The latter 
were very good and quite adequate for the purpose in hand. Some specimen figures obtained 
are given below and show only minor discrepancies in the fourth decimal place. 






































T=- T=3 T=4 
# 
Exact gm, Approx. Ym Exact Ym Approx. dm Exact qm Approx. 9, 
= — eee -|- 
0-4 0-7103 | 0-7100 0-3143 | 0-3142 | 0-1168 0-1167 
0-3 0-5036 0-5036 0-1551 0-1551 | 0-0420 0-0420 
0-2 0-2733 0:2734 0-0523 0:0524 0-0093 0-0093 
0-1 0-0803 0-0803 0-0073 0-0073 | 0-0006 0-0006 
| TEE AP EE Bie en ES 





5. In order to compare different sizes of sample, different lengths of run and different 
changes in the mean simultaneously it is simplest to keep one of the possible variables fixed. 
In this case it was decided to relate the change in the mean to the size of the sample, n, that 
is being used and consider changes in the mean of three magnitudes, namely 


O-5a/,/n, 1:00//n, 1-5e/,/n. 


By doing this the results become, for the purposes of these illustrations, independent of the 
size of sample as, whatever the shift, the probability of falling beyond the limits specified 
is the same for all sizes of sample. 

Two significance limits were utilized, described as inner and outer. Limits £+A,za/,/n 
were chosen as described in §3, such that if there was no shift in population mean the 
average number of samples of n which must be observed before a run of 7’ means exceeded 
the corresponding limit was (a) 200 for the inner limit, (b) 1000 for the outer limit. Figs. 1-6 
show the probabilities that after a specified number of samples, m, have been drawn there 
will have been at least one stoppage due to 7’ successive means falling beyond the level 
prescribed. The figures show that there is always some gain in using a higher value of 7’, 








92 Some properties of runs in quality control procedures 


although the actual gain obtained when increasing 7' by one decreases as 7’ gets larger. 
As a rough and ready guide it appears that 7’ = 2 is the most suitable value to use. It has 
great advantages over 7' = 1 and avoids the inevitable minimum delay in picking out a 
large change which must occur when a decision has to await at least three (or four) sample 


results. 





Oo 


oot 
ao wo 


° 
~ 
=) 





°° 
wn a 


0: 


tt 
oOo 


Probes of failing to detect 








1 
2 
3 
4 
5. 
6 
07> 
8 
9 
0 


Probability of detection in m samples 
° 
> 























03 
02 0: 
O1F 0: 
0 1 
m—» 10 20 30 40 0 20 70 a0 30 100 
8 1 0 
a > ed 
E 0 01 ¥ 
- 0 029 3 
20 03 ee 
6 O 04 25 
¥ 05) 40s = 
@ | N.B. Curve for T=4 is _In, 9G 
S 0-4 virtually the same 06 > Se 
> 03F as for T=3 7 SE 
i 02- Fig3 08's 
8 Of “09> 
a 9 ee ea Se ee 
m—~10 20 30 40 50 
10 0 ” 
oe 
c ea 
£ 09 01 wo& 
vw fu 
ee Ge 
a —_ eS 
S E ats N.B. Curves for eS £ 
bad =3 or 4 are virtuall ~& 
BE 07 the same as for 7=2032 e 
=ec oo 
4* bf. 
a 05 3 








m—>5 10 15 20 25 
Fig. 1. Use of inner limit with shift of 0-50/,/n. 
Fig. 3. Use of inner limit with shift of o/,/n. 
Fig. 5. Use of inner limit with shift of 1-50/,/n. 


change in m samples 




















$ 10 0 

E 09 O14 

eo 

% 08 02% 
& 06 042 5 
2 05 OSE E 
> 04 og? = 
S >&% 
> 03 O7= § 
= = 
5 02 085 ° 
3 2 
8 01 09a 
ve, 10 
m— 

1 

rs 10 

S09 ‘ 
§ 08 5 gs 
E 33 
s 07 aoe 
5 06 2s 
% 0s SEE 
g £ 
tee >e 
93 22 
» oe 
3 Figs 1988 
S$ o1- 4o9a 
a 0 








1 
O10 20-30 40 50 60 70 80 90 100 

















& 

Qa 

: 10 0 3 
S 09+ 01 v0 & 
‘ ee 
ga 08 025 « 
‘5 § 07 033 » 
>=~c 

BE 06F 42s 
=e a YV 
8 - O05 OSS 4 
F: ~ Fig6 406d & 
03 | 4 | 07 ~«~o 
ne ee a 


Fig. 2. Use of outer limit with shift of 0-50/,/n. 
Fig. 4. Use of outer limit with shift of o/,/n. 
Fig. 6. Use of outer limit with shift of 1-507/,/n. 


Table 1 gives comparative results in terms of average number for each of the schemes 
illustrated. For very large shifts it will be noticed that the larger values of 7' seem unsuit- 
able, because the necessary minimum delay prevents the average coming down. 


6. Page (1955) has investigated the properties of inspection rules that are based on both 
warning and action limits operating in conjunction. The calculation of the probability 
that action has been taken after m samples have been observed is difficult for such schemes. 
Hence, we will utilize the average run length (4.R.L.) to compare (a) two specimen schemes 
of Page with (6) those described above. The a... is the average number of items that have 
~ to be observed before the machine is stopped, following the rules for stopping laid down. 








act 





ger. 
has 
ut a 
nple 





P. G. Moore 93 


Scheme I. Samples of size 5 are used. 
(a) Stop the production if 3 consecutive sample means fall between the warning and the 
action limits or if any one mean falls beyond the action limits: 
warning limits m+ 1-250/,/5, 
action limits +3-000/,/5, 
where jz and o are the mean and standard deviation of the controlled scheme. 
(b) Alternatively, we stop the machine if a run of 7’ means fall beyond some limit, the 


limit depending on 7' and being chosen in such a way that the average run length for the 
system when in control is always the same as in (a). 


Table 1. Average number of samples of n observed before stopping 




















| 
| 
Shift in mean Limit used | T=1 fe | T=3 T=4 
| 
La aa as SRS ey aa t | | 
0-507/,/n Inner 52-8 | 40-2 36-4 34-9 
| Outer | 208-5 | 13941 | 1143 102-0 
| | 
1-0a/,/n | Inner | 17-4 | 12-5 | 11-8 11-8 
Outer 54-6 30-7 24-8 22-8 
| | 
| 1-507/,/n Inner | 71 5-6 | 5-9 6-6 
Outer | 17-9 | 103 | 9-3 93 | 
| 








Table 2. Average run lengths for Scheme I 








| Value of k 
Scheme i x 
| 
0 0-2 0-4 | 0-6 | 
ria “S ate rit ee ie “a | | 
(a) (Page) 003 | 285 | 102 | 43 | 
=1 A,=2-58 | 503 281 108 46 
(b) T=2 A,=1-45 503 226 | 78 | 35 
T=3 A,=0-91 503 207 72 | 35 
T=4 A,=0-56 503 199 | 72 | 38 
| } 








Table 2 gives the a.R.L.’s before a stoppage occurs when the mean of the population is 
changed by an amount ko. It is assumed that o remains unaltered. The values for (a) are 
taken from Page (1955). The averages are not always multiples of the sample size, 5, but 
this is only a reflexion of the fact that they are just averages and the average value itself 
cannot always be achieved. From Table 2 it seems that scheme (a) can always be bettered 
by one of the (5) type schemes, but unless the value of k were known it would be impossible 
to say which value of 7’ should be used. 








94 Some properties of runs in quality control procedures 


Scheme II. Samples of size 5 are again used. 
(a) Stop production if 4 consecutive sample means fall between the warning and the 
action limits or if any one mean falls outside the action limits: 


warning limits m+ 1-000/,/5, 
action limits +3-000/,/5. 
(b) is the same as for Scheme I, adjusted so that the average run length when the system 
is in control is the same as in (a). A comparison is shown in Table 3. The same characteristics 


are apparent as for Scheme I and it is impossible to nominate one scheme of type (6) that is 
going to be always the best. 


Table 3. Average run lengths for Scheme II 

















| 
Value of k 
Scheme eae 
| 
0 | 0-2 | 0-4 0-6 
| | 
& : : I 
(a) (Page) 527 305i 112 47 
T=1 A,=2-59 527 293 | 112 47 
w) {7=2 A, = 1:47 527 | 235 80 36 
T=3 A,=0-92 527 | 214 73 36 
T=4 A,=0-57 527 | 206 73 | 38 
7 ve, ae | | 





Table 4. Values of Ap in two-sided control limits 





| Average samples | | 
examined ... 500 400 | 300 200 | 100 | 50 | 2% 
| | 





2-326 | 2-054 





| | 
T=1 3-090 3-023 | 2-935 | 2807 | 2-576 | 
T=2 1-850 1-799 1-732 | 1-633 | 1-452 | 1-253 | 1-029 
T=3 1-261 1-216 1156 | 1-068 0-906 | 0-724 0-515 
T=4 


art fP 


The results for these two schemes are in agreement with the results found by Weiler 
(1953) and it can be seen from his paper that in many cases there is an optimum value of 7' 
if the average run length is to be kept to a minimum. 


0-888 0-846 0-791 | 0-709 | 0-556 | 0-383 | 0-179 


| 
| 
| 
| 





7. In summary it seems that the use of a system of runs can effect considerable improve- 
ments in the properties of quality control schemes. Although the schemes discussed here 
can produce lower A.R.L.’s than Page’s schemes, the latter have the merit of being suitable 
for changes of both type (i) and type (ii) of § 1. The schemes discussed here are best for changes 
of type (i) and less suitable for changes of type (ii). This would have to be borne in mind 
when choosing a suitable scheme for practical purposes. It is difficult to lay down a general 
rule as to the most suitable value of 7' to use in all circumstances, but it is apparent that 
increasing 7' from one to two will in general effect an improvement. Increasing 7' further 











he 














P. G. Moore 95 


may only produce a small improvement or even none at all. There is the further disadvantage 
that large values of 7’ imply that the minimum number of samples to be examined before 
a decision is reached gets larger. 

As control limits are commonly set on both sides of the nominal mean, Table 4 gives 
the limits, A, to be used if the average number of samples observed before a stoppage 
occurs (when the population mean has its correct value) is to be the number specified at 
the head of the column. The control limits are then taken as + A7a/,/n. A comparison with 
§ 3 shows that in the two cases considered there, the A.R.L. has now been halved for the same 
values of Ap as a result of the two-ended nature of the present scheme. 


REFERENCES 


BatTEMAN, G. (1948). Biometrika, 35, 97. 

Duppine, B. P. & JENNETT, W. J. (1944). Quality Control Chart Technique. 
London: General Electric Co. Ltd. 

FELLER, W. (1950). An Introduction to Probability Theory and its Applications. 
New York: Wiley and Sons. 

MostTEtteER, F. (1941). Ann. Math. Statist. 12, 228. 

OtmstTEAD, P. S. (1946). Ann. Math. Statist. 17, 24. 

Paas, E. 8. (1955). Biometrika, 42, 243. 

Wetter, H. (1953). J. Amer. Statist. Ass. 48, 216. 





[ 96 ] 


SIMULTANEOUS REGRESSION EQUATIONS IN 
EXPERIMENTATION 


By E. J. WILLIAMS* 
Institute of Statistics, North Carolina State College 


1. [IyTRODUCTION 


The purpose of this paper is to discuss the determination and interpretation of simultaneous 
equations fitted to experimental data. Although little has been written on simultaneous 
equations in experimentation, their uses in economics have frequently been discussed. In 
that field, however, there is often no distinction between dependent and independent 
variables. In what is known in econometrics as a complete system of simultaneous equations, 
there are as many equations as endogenous variables, so that the equations consist of a 
linear transformation from the unknown disturbances and known exogenous variables to 
the observed variables. The treatment of simultaneous equations in econometrics is 
generally troublesome and depends on the completeness of the system of equations, and the 
identifiability of the parameters. 

In experimental work, on the other hand, there is in many situations a clear distinction 
between the dependent and independent variables. Thus, the number of equations will be 
at most equal to the number of dependent variables. In this field, too, there is a case of 
particular interest, as will be shown below, which occurs when the numbers of dependent 
and independent variables are the same. The applications of simultaneous equations to 
experimental work seem to be quite important and are much more straightforward than 
those in econometrics, yet, strangely enough, they seem to have been little discussed. The 
only published work in this field with which we are familiar is that of Box & Hunter (1954), 
but even this relates to a different situation from that considered here, and to particular 
applications in experimental design. 

We begin by discussing a simple application of simultaneous equations to experimental 
work. Then will follow the mathematical theory, after which special cases will be discussed. 
It seems that the relatively early introduction of the linear discriminant function has 
diverted the attention of statisticians from simultaneous equations; it appears that in 
many cases which are dealt with by discriminant functions, the set of simultaneous equations 
(from which the discriminant function can be derived) is more informative. 


2. A CHEMICAL EXAMPLE 


Fisher, Hansen & Norton (1955) discuss the quantitative determination of glucose and 
galactose simultaneously in solutions of unknown chemical composition, by means of 
optical density measurements. Without going into the technical details, which are given 
in the paper referred to, we can simply state that solutions of glucose and galactose are 
treated to develop a colour; the optical density of the solution to light of two different 
wavelengths is then determined, and the two data thus obtained are used to estimate the 
amount of each sugar in solution. It is assumed that, within the range of concentrations 


* This work was sponsored by the Office of Ordnance Research, United States Army. 

















E. J. WILLiaMs 97 


studied, optical density for each sugar is proportional to amount of sugar; then use is made 
of the fact that each sugar differs in its density to light of different wavelengths. 

Solutions containing known amounts of glucose and galactose were prepared, and the 
density at two different wavelengths (470 and 560my) determined. The data enable a 
regression of density on amount of each sugar to be determined for each wavelength. These 
regressions then constitute a calibration of the apparatus, such that if optical densities 
for some unknown solution are substituted in the equations, the amount of each sugar can 
be estimated. 

Thus, if y, and y, are the optical densities at 470 and 560 my, respectively, and 2, and 
xz, the amounts of each sugar (in milligrammes), the regression equations may be written 

Vy = by %+091%, Yq = by2% + by9%p. (1) 
These equations have no constant term, since the optical densities are zero at zero con- 
centration of the sugars. In the practical use of these equations, the y’s will be observed 
values and the x’s predicted. If the equations are solved for this purpose, we get 


XxX; _ bUy, + b*1y,, X, = by, + b*y,, (2) 
where the matrix Thu. B21] 
b12 22 
is the inverse of the original mairix of regression coefficients, 
ay bey | ’ 
bis boo | 








The equations (2) will be called inverse regression equations, and the X-values inverse 
estimates. It will be seen that in practically every calibration problem, inverse estimates 
are required, since the quantities arbitrarily assigned in the calibration are unknown in 
the application to estimation. 

Problems of this kind must be of frequent occurrence in quantitative chemical analysis 
and in other fields. The determination of the accuracy with which estimates can be made 
from such equations is an important practical problem. We now give the mathematical 
derivation of sampling errors and fiducial intervals, before returning to the arithmetical 
analysis of the example just discussed. 


3. SIMULTANEOUS EQUATIONS IN GENERAL 
In general, we may consider that we have n observations on each of p independent variables 
x; (i = 1,2,...,p) and q dependent variables y; (j = 1,2, ...,¢), and that we require to esti- 
mate the y; in terms of the x; or vice versa. Then we may determine q regression equations 
Y;= D542; (j = 1,2,...,9) (3) 
v 


in which for simplicity the variables are measured from their means so that the constant 
terms vanish. 
We adopt the following notation: 
t,; sum of products of x, and x; (n—1 degrees of freedom), 
u;, total sum of products of y; and y;,, (n—1 degrees of freedom), 
Vj, + vesidual sum of products of y; and y;,, (n—p—1 degrees of freedom), 
T = (thi), T= ("*), 
U= (25x); 
V = (vx), V-1 = (vik), 


7 Biom. 45 








98 Simultaneous regression equations in experimentation 


Lower case x; or y; will denote either observed or potentially observed (though sometimes 
actually unknown) quantities, while capital X; or Y; will denote estimates based on the 
observed quantities. 


4. DIRECT ESTIMATION 


If one of the y; or a linear combination of them is to be estimated from the equations (3), 
the procedure is straightforward. For the variances of the regression coefficients we have the 
familiar results a 
(n—p-—1) V(6;;) = 0,0" 
and generally (n—p—1)cov (b,;,6;;,) = vj,t"* 
= (n—p—1) cov (b;5, bp,). (4) 
Hence, for the variance of an estimate, we have 
(n—p—1) V(¥;) = vyj{(1/n) + < Day x;} (5) 
t 
and in general for the covariance of any two estimates, 


(n—p—1)cov (Y;, Y,) = sn (1/n) a = p> thix, x;}, (6) 


the term 1/n being included to allow for the fact that the variables are measured from their 
means. 

In order to know how much a new observation y, will vary about the predicted value, 
we need the variance about an estimate, as well as the variance of the estimate. We have 


(n—p—1) V(y;—Y;) = Hv; (7) 

and generally, (n—p—1) cov (y;—Y;, y,—Y,.) = Hvy,, (8) 

where H =14+(1/n)+>d Dd ta, 2;. (9) 
hi 


If it is required to estimate a linear combination of the y;, for example 


Ya = LY;, 


the regression coefficients are linear combinations of the original coefficients, viz. 


and the regression equation for estimating y, may be written 


Y,= biaXj- 
The variance of an estimate is given by : 
(n—p—1) V(¥,) == ~ 5 My, Vx{(1/m) + py d ta, x;}. (10) 
J v 


A special case of a linear combination of the dependent variables is Hotelling’s ‘most 
predictable criterion’. For a linear combination with coefficients a;, the residual sum of 
squares after fitting the regression on the ~;, is 


22454405 (11) 


and the total sum of squares is XL DY 4; Aj, Ujy- (12) 
jk 








mes 
the 


heir 


lue, 
ive 


(7) 
(8) 
(9) 


(10) 


10st 
1 of 


(11) 


(12) 





E. J. WILLiAMs 99 


The linear combination which minimizes the ratio of (11) to (12) will clearly be an estimate 
of that linear combination which is least affected by departure from regression, and has 
been designated by Hotelling the most predictable criterion. The coefficients a; will be 
found as one of the latent vectors of the matrix V-1U. Whether this linear combination has 
any relevance to the interpretation of the data will depend on the nature of the problem. 


5. INVERSE ESTIMATION 


As mentioned earlier, we are most often interested in using a set of simultaneous regression 
equations inversely for estimating values of the independent variables from observed values 
of the dependent variables. This situation arises frequently, for example, in calibration 
experiments, as the above discussed example shows. Now in order that the regression 
equations may be solved for the independent variables, it is necessary, generally speaking, 
that the number of equations equal the number of independent variables. If there are 
fewer equations than independent variables, they cannot be solved, and all that can be 
determined are certain relationships among the estimated values of the independent 
variables. On the other hand, if there are more equations than unknown independent 
variables, we have redundant information; however, by an adaptation of the method of 
least squares, valid estimates of the unknowns may be determined. In this case the dis- 
crepancies of the individual equations from these estimates provide a measure of the 


consistency of the different equations and hence of the different dependent variables. We 
shall consider each of these cases in turn. 


(a) p=4q. 
In this case the regression equations (3) may be solved directly to give the estimates of 
the x;, which we denote, without risk of confusion with direct estimates, by X;. The solutions 


are X;, = Dbity,, (13) 
j 


where the 6/¢ are the elements of the matrix inverse to the square matrix 
B = (b,;). 

We note that in the matrix B, rows correspond to x-variables and columns to y-variables, 
while in B-!, rows correspond to y-variables and columns to x-variables. Thus, for either 
direct or inverse regression equations, the regression coefficients corresponding to any 
predictand are read down the columns. 

We shall show below how tolerance limits for values, corresponding to the estimates X,, 
may be determined by means of the F-test. First of all, however, it is of interest to deter- 
mine approximate standard errors for these estimates. These standard errors will be 
applicable when the estimated regression coefficients are large compared with their stan- 
dard errors, and the inverse regression coefficients are likewise large compared with their 


standard errors. This second condition requires in particular that the matrix B be not almost 
singular. 


Now we have BB = TI; 
hence, on taking differentials, and multiplying the results by B-1, we find 
dB- = —B-(dB) B", (14) 
whence dbit = — ~ = bINDKF dd yy. (15) 


7-2 








100 Simultaneous regression equations in experimentation 


The equations (14) and (15) represent a linear transformation of the differentials db,,. 
Taking the direct product (van der Waerden, 1931) of such a transformation and its 
transposed, we have 
dB- xdB’— = (B-\(dB) B~) x (B’(dB’) B’-) 
= (Bx B’) dB xdB’) (Bx B’). (16) 


Now each of the direct products in equation (16) is a p? x p? matrix whose typical elements 
are products of two regression coefficients or differentials. For instance the typical element 
of dBxdB’ is db; A yp 


If we take expectations of each side of equation (16), we get on the left-hand side the 
matrix of variances and covariances of the 6’, while on the right-hand side the middle 
factor gives the matrix of variances and covariances of the b,;. Now, as we have seen, the 
appropriate estimate of the covariance of b,; and b,,; is 


tv; 5/(n—p—1). 


If we make a suitable permutation of rows and columns, the expected value of the middle 
factor therefore becomes the direct product 


Pr Tx V|(n—p—1), 
of two p x p matrices. 


Denoting the estimated expected value of the left-hand side by W, we find, again after 
suitable permutations of rows and columns, that 
(n—p—1)W = (Bx B’-) (T- x V) (B’- x B-) 
= (BT-1B’—) x (B’1VB-1) 


= M-1xQ-1, (17) 
where M = BTR, 
and Q = BV-B’, 
so that M-1 = B-1T-1B"-1, 
and Q7 = B-VB-. 


This result gives in particular 
(n—p—1) cov (bi, b7”) = mii'qi* 
= SLM MBI’ DY vp bev 
h ht kk 
= (n—p—1) cov (bi, b’*). (18) 
These results are, of course approximate and will often be inaccurate; their interest lies 


in the fact that the expressions found are similar to those occurring in the exact analysis. 


We may now determine approximate variances and covariances of estimates X; based 
on observations y;. 


(n—p—1) V(X.) = (n—p—1) V(EO"y,) 
= (LF 1m) DD vy DEL TD DDD yyyyt™ OMIM SY dpyebMOK 
$7 hh’ kk 


= » Evy bHO1(1 +(1/n)+ » WX, Xp}. (19) 





th 





dbp. 
1 its 


(16) 


lents 
nent 


> the 
ddle 
, the 


ddle 


fter 


(17) 


19) 





E. J. WILLIAMS 101 


This result follows from the formula for the approximate variance of a product, and from 
the fact that b/ and the y; are independent. Similarly, to the same degree of approximation, 


(n—p—1)cov(X,,X,) = Y>¥ 047 DDI {1 +(1/n)+d D"X,, X,}- (20) 
ia ig h h’ 


The covariance matrix of the X; may be written HQ-*/(n —p— 1), where Q-' = B’VB-, 
as above, and H is here a function of the estimates rather than of observed values as defined 
in (9). It may be noted that these results are analogous to those found in direct estimation 
of an observation y;. There we have 


(n—p-—1) V(y;—Y;) = 04, H, 
and (n—p—1) cov (y;—Y;, yy —Yy) = v4 H, 


where the 2, are now observed quantities, the Y; are regression estimates, and the y; are 
new observations, not used in determining the regression. 

The exact determination of sampling variation is not much more complicated. We may 
find simultaneous limits for the unknown quantities x; in the following way. The ratio 


ik(y. — -.e = Xe 
n—2p zu” (y; 2 bi5%1) (Y;, 2 inti) (21) 
p H 
is distributed as F with p and n—2p degrees of freedom. By substituting various sets of 
values of the x; in the formula we can determine for which sets the associated value of F 
is non-significant, and hence which sets are concordant with the data. The range of con- 


cordant sets of the x; defines a fiducial region for the values. 
Now since we may write 
, ¥; = x bi; Xx re) 
v 





the y; being observations and the X; estimates, we have 


x = vik (y; — D> bi5%;) (Yx-X biy,%;) = ~ > ~ = (Xj, —%p) (X;—2;) ody 5 bi, 


j 


- ~ >» (Xp — Xp) (X;— 2%) Ini> (22) 
t 
where Int = DLV y Bix. 
ik 
Since q;,; is a typical element of the matrix 
Q = BV"B’, 
(22) may be written (X —2) BV-*B’(X’ —2’). 
Hence, the simultaneous fiducial limits for the values x; are given by the solution (if real) of 
ce. | alii 28) 
ee a Tae 
If limits for a single value 2, are required, we have 
V(X,) = Hq""|(n—p—}). (24) 
Now since Q=BV"B’, Q2 = BVB—, 
so that gt = DY 0,4, b*d*. 
jk 








102 Simultaneous regression equations in experimentation 
Hence, with 1 and n—p-— 1 degrees of freedom, 


—p—1)(X,—4x,)? 
-TTES0,b 7 
i 


Note that the variance estimate given by (24) differs from the approximate estimate given 
above in (19) by the replacement of calculated quantities X; by unknowns z;,. In practice, 
since the x; are unknown, the approximate variance estimate based on the X; would need 
to be used to give fiducial limits for a single 2. 


(b) p<q 

In this case we have more equations than unknowns. We have a choice here either of 
omitting g—p of the equations (provided we can decide from prior considerations which are 
least useful), or of using the additional information given by the equations to test the con- 
sistency of the relationships involving the different dependent variables. This latter aspect 
is the one that we shall examine. 

If an observation of a set y;(j = 1, 2, ...,q) of dependent variables is to be used to estimate 
a set x; (¢ = 1, 2,...,p), we may so determine the estimate that it has minimum (estimated) 
variance. Now since the estimated covariance of y; and y;, is proportional to v,,, the quan- 
tity to be minimized, with respect to the ~;,, is 


p> 2 vIE (ys — & b:5%;) (Yx—X Sin): 
I v 7 


If we put, as in (a) Q = BV-"'B’, 
so that Qi = VD Vy dix, 
jk 
and also put P = BV-y (26) 
so that Pi = LDV v"* iY, 
j k 
we find for the normal equations QX = P, (27) 
i.e. Dn Xn a 
so that X=Q"'P, (28) 
or Xn = X"D;- 
i 


These results are similar to those found for the case p = g, except that here the matrix B 


does not possess an inverse, so that the estimates need to be expressed in terms of the 
matrices P and Q. 


As in the case p = q, the estimated covariance matrix of the X; is 


SS 


n—p—1 
Now we may test the consistency of the g equations by means of the departures of the 
observed y; values from the estimates provided by inserting the X; in the equations. The 
criterion is ik(y,—¥'b,,X; —¥b..X, 
v t v v 
ee eee 
q-P H 
which is distributed as F with q—p and n— p—q degrees of freedom. 














If t 


con 


wit 


By 


fw 


— en st wee 


(25) 


riven 
stice, 
need 


er of 
1 are 
con- 
pect 


nate 
ted) 
1an- 


26) 


27) 


28) 


cB 
he 


he 
he 





E. J. WILLIAMS 103 


This may be otherwise written 


n— 





ek Oe ae DEI Xn Xi} 
“eae pyre i 


If the value of F is not significant, there is no evidence for regarding the equations as in- 
consistent, and fiducial limits may be determined for the x;. For these we have 


B= PD SY ay Xn—%s) (Xi) (30) 
pH ht 
with p and n—p—q degrees of freedom. This may be written in the alternative form 
n=?) — ’ ’ 
B= * FAS a (Pa Deidni) (Px — D%iMnd- (30') 
Pp h h’ i i 


By means of this criterion, the concordance of any set of x; with the data may be established. 
In the particular case when p = 1, the solution of the equations gives the discriminant 

function for assigning a value of x, on the basis of observations of the q variables y,, Yo, ..., Yg- 
The discriminant function is 


X= 71/9 
= DD by yp! DD ej by,. (31) 
jk ik 
To test the consistency of any set of observations y;, the criterion is 


n—q-1 ~ Denys Yu— Pilg 
2 
q-1 1+1/n+23/t,, om 





with q—1 and n—q-—1 degrees of freedom. 

It should be remarked here that this is a test, not of the discriminant function, which 
has been established from previous data, but of the consistency of the present set of obser- 
vations. A significant result may indicate either that the values of y; are not consistent 
among themselves, or that the discriminant function determined from previous data does 
not apply to the present observations. 

(c) p>q 

In this case we have fewer equations than unknowns, so that estimates of the unknown 
x, cannot be determined. The most that can be done is to find a relationship among p —q+ ! 
of the estimates X;. In many cases such a relationship may be all that is required, as is 
indicated in the example given below. 

Suppose that we wish to eliminate X,, X,,...,X,_,, and to determine the relationship 
among X,,X,,,,---,X,. The determinant of the first q— 1 rows of B and the q—1 columns 
resulting from omitting column j will be denoted by (—1)/-1 B;. Then it is readily shown 
that the required relationship is 


qd p qa 
j=1 i=q@ j=1 


The fiducial limits for the corresponding relationship among the x, can only approx ‘mately 
be determined. 





104 


Simultaneous regression equations in experimentation 


The fact that p>gq does not, however, prevent simultaneous fiducial limits for the z, 
from being found. The criterion, distributed as F' with q and n—p—q degrees of freedom, 
from which similtaneous fiducial limits may be derived, is 


n— 





q jk 


aed x p> wk (y; a ~ b:;%;) (Yn— = bi,.%;) = 


qH 


oo 


~ > Qni( Xp, —X,)(X;—2%;), (34) 


where q,, is the typical element of the matrix Q defined above. Here Q, though it is a p x p 


matrix, is of rank q. 


6. DiscuUssION OF THE CHEMICAL EXAMPLE 


The original data of the experiment discussed in §2 are given by Fisher, Hansen & 


Norton (1955) in their Table 1, so are not reproduced here. 


Table 1. Analyses of variance and covariance of optical density measurements at 
470 my (y,) and at 560 my (y,) (Fisher, Hansen & Norton’s data) 


























| Sums of squares and products 
| Degrees 
| of freedom 
| yi WYs Ye 
Regression on 21, 2 2 2-570253 4-207267 6-995805 
Residual 26 0-003167 0-002996 0-006733 
Total 28 2-573420 4-210263 7-002538 | 











Table 2. Matrices of sums of squares and products, and of regression coefficients 























T 10°V B 
0-2500 0-0750 3167 2996 1-2166 1-3465 
0-0750 0-2500 2996 6733 2-6240 4-7276 
o-1 | Vv-1 B-1 
eS ees ! | : me 
4-3956 —1-3187 | 545-3 — 242-6 2-1311 —0-6070 
— 13187 43956 — 242-6 256-5 — 11829 0-5484 





Fisher et al. (1955) fitted quadratic regression equations to their data, but as we found 
that the quadratic terms were significant only at the 5% level for optical densities at 
560 my (y2), we have ignored these terms and fitted only linear regressions. The analyses 
of variance and covariance of y, and y, are shown in Table 1, and the B, 7' and V matrices 
and their inverses in Table 2. 








Th 


wel 
ma 
var 


dia 


SO ' 
ou’ 


cor 


th 
de 


in 


n & 





nd 
at 
es 
eS 





E. J. WILLIAMS 105 


Thus we see from Table 2 that the direct regression equations are 


Y, = 1-2166a, + 2-62402,, Y, = 1-3465x, + 4-72762, (35) 
and the inverse equations are 
X, = 2-131ly,—1-1829y,, X, = —0-6070y, + 0-5484y, (36) 


in agreement with the results of Fisher ef al. (1955). 

The direct equations are less useful than the inverse ones. Since in this example the 
numbers of dependent and independent variables are equal, no test for consistency is 
possible, but we can derive fiducial limits for the values of x, and 2, corresponding to 
observed values y, and y. 


Table 3. Matrix products required in estimating variances 





M-=Bo7T—B’ 10°(Q-1 = B’" VB) 





| 
\ 
24-995 —15-032 | 8699 —2812 
— 15-032 9-183 | — 2812 1197 











Since the inverse regression coefficients, as well as the direct coefficients, are likely to be 
well determined, we may calculate their approximate standard errors. Table 3 gives the 
matrices B-17'-1B’—1 and B’-!V B- required in these calculations. Then, for example, the 
variance of 6%! is obtained using the second diagonal term of B-!7'-1B’— and the first 
diagonal term of B’-!V B-: 

9-183 x 10-® x 8699/26 = 0-003072, 


so that the standard error of 6?! is 0-055. The standard errors of the coefficients may be set 


out as follows: 0-091 0-034 
fosies ere 


For general purposes, of course, the covariances as well as the variances of the regression 
coefficients will be of interest. 

In determining the approximate variance of an estimate X;, since the regression is 
through the origin rather than the point of means, the actual values of the y; rather than 
departures from means are used, and the term 1/n is omitted from the variance estimates, 
in equation (19). 

Thus, approximately, 


10-* x 8699 
Vit.) = —- 6 (1 + 24-9942 — 30-06y, y> + 9°18y3) 
oe 3 5 699 (1 4. 4-396X2—2-637X, X, + 4-396X2) 


with similar results for cov (X,, X,) and V(X,). 


7. AN EXAMPLE OF INVERSE ESTIMATION WHERE p > q 


A study of the pulping properties of eucalypt woods is reported by Cohen & Mackney 
(1951). The object of the studies was to determine a treatment which would produce pulp 
of the required lignin content, from wood with certain characteristics. The percentage of 








106 Simultaneous regression equations in experimentation 


the wood material soluble in hot water (hot-water solubles, x,) was determined for each wood 
sample, which was then divided into four parts, each being pulped with varying amounts 
of active alkali (x, °% of the wood weight). The same levels of active alkali were repeated 
for each sample, so that the two independent variables were uncorrelated. The lignin 
content of the resulting pulp was measured in terms of a ‘permanganate number’, and its 
logarithm to base 10 (y) taken as the dependent variable. The data are shown in Table 4. 


Table 4. Data from a study of pulping properties of eucalypt woods 


x,= percentage hot-water solubles; «,=percentage active alkali used in pulping; 
y=log permanganate number 









































| 
vy | ve y vy | Xe ¥ xy Xe y 
[ — 
5:97 15 1-425 6-79 | 15 1-498 13-19 15 1-734 
17 1-250 a 1-330 17 1-535 | 
19 1-170 | 19 1-233 19 1-326 | 
21 1-124 21 1-161 21 1-201 | 
8-00 15 1-641 9-20 | 15 1-442 9-52 15 1:500 | 
17 1-418 | 1-255 17 1-281 
19 1-230 19 1-146 19 1-152 
21 1-164 21 | 1-093 21 1-104 
8-51 15 1-655 10-00 15 | 1-507 9-46 15 1-610 
17 1-384 17. |= +1-882 17 1-425 
19 1-334 19 | 1-220 19 1-283 
21 1-164 21 | 41-199 21 1-204 
4:51 15 1-486 10-94 15 | 1-667 3-17 15 1-204 
17 1-272 17 | 1-458 17 1-130 
19 1-185 19 | 1-258 19 1-083 
21 1-124 21 | 1-173 21 1-004 
3-15 15 1-250 6-35 15 | 1-391 3-53 15 1-236 
17 1-146 17 1-207 17 1-149 
19 1-086 19 1-100 19 1-061 
21 1-033 | 21 | 1-079 21 1-025 
| | | 
Total | 15 22-246 
| 17 | 19-572 
19 17-367 | 
21 | 16-852 Means 7-486 18 1-2756 | 
| | 
Grand total 76°537 
| a 








This example does not illustrate the use of simultaneous equations, but it does show how 
inverse estimation is possible when there are more independent than dependent variables. 
The appropriate regression is that of y on x, and x,; however, what is required from the data 
is an estimate of the relationship of x, to x,, corresponding to a fixed value of y; in other 
words, the alkali requirement 2, which will result on the average in a given lignin content Y, 
when the hot-water solubles figure x, is known. 











E. J. WILLIAMS 107 


The relevant sums of squares and products are shown in Table 5. The regression equation 
is found to be Y = 2-123 + 0-03012, — 0-0596z,. (37) 
The analysis of variance in Table 6 shows this regression to be highly significant, and the 
residual variance to be 0-005804. Since the 1 % point of F with 1 and 57 degrees of freedom 
is 7-102, the 99 % fiducial boundary for the regression relationship is 
(Y — 2-123 — 0-03012, + 0-0596z,)? = 7-102 x 0-005804 (<6 + a : ie (2 ue . (38) 
The lignin content required for the pulp corresponds to a ‘permanganate number’ of 15 
(i.e. Y = 1-176); this value inserted in the equation gives the relationship 
X, = 1589+ 0-5042, 
so that, once the hot-water solubles percentage is given, the requirement of active alkali 


ean be estimated. The fiducial boundary for the relationship is given by substituting 
y = 1-176 in equation (38). 





Table 5. Calculation of regression coefficients from values in Table 4 
























































Sum of Sum of Regression Regression sum 
squares products with y coefficient of squares 
a 516-315 15-5292 + 0-030077 + 0-0034 0-4671 
Xo 300 — 17-887 — 0:059623 + 0-0044 1:0665 
1-5336 
Table 6. Analysis of variance 
| 
Degrees of 
Sealietih Sum of squares Mean square 
Regression 2 1-5336 0-7668** 
Residual 57 | 0-3308 0-005804 
| 
te] 
Total 59 | 1:8644 








** Significant at 1% level. 


8. PROPORTIONAL REGRESSIONS 
In certain cases it is of interest to fit equations in which the coefficients are proportional. 
For instance, in studying various properties of coals, such as their carbon content, sulphur 
content and calorific value, it may be supposed that each is linearly related to the percentage 
ash content. It might be expected that, if the ash were simply the result of admixed im- 
purities, its effect would be a simple percentage reduction, the same for each of the pro- 
perties. Thus, if 2 were the ash content, and the regression of the jth property y; on x were 








108 Simultaneous regression equations in experimentation 


we should expect b; to be negative, and b,/b); to be in the neighbourhood of — 1/100. In 
general, if the theoretical value of the ratio were — 1/£, we could fit the restricted regression 


equations ¥, = bi(x—£) (40) 


the value of £ being the same for each line. 

We should then be interested in testing, first, the validity of the assumption of a constant 
value £ for the different dependent variables, and secondly, the acceptability of various 
values of &. 

We shall here consider only the case of one independent variable, which seems to be of 
most practical interest; the extension to more than one independent variable introduces 
no new principle. The additional complication in fitting proportional equations arises from 
the fact that there is a constant common to all the equations. 

Since the equations of estimation of & are not linear, the method of least squares does 
not lead to exact significance tests and fiducial limits. Instead, the following method is 
adopted. The null hypothesis on which the test is based is that the regressions are pro- 
portional, with constant of proportionality ¢. The constant is unspecified, so that the test 
criteria are functions of £. If for any particular values of & the test criteria are significant, 
the null hypothesis, and the corresponding value of £, are rejected at the level of significance 
adopted. We are thus able to set fiducial limits on é. 

We shall denote the sum of squares of x by ¢, and the sum of products of y; with x by p,, 
and shall adopt the following notation for the restricted regressions with constant &: 


p; = Sy,(z—§), t= S(x—£)?, 6; = pijit’. 
To test the validity of the hypothesis, consider the unrestricted regressions, which may 
be written at ri 
Y; = 9; +6,(%—2). 
When x = &, Y; should differ from zero only by errors of random sampling. Hence, the q 
titi eS a 
quantities 2, = ¥,+b,(E—2) (41) 


have a joint normal distribution centered on zero. The analysis of these quantities provides 
tests of the hypothesis. 


The covariance of z; and ~,, estimated with n — 2 degrees of freedom, is 


so that the sum of squares of these quantities may be taken as 
t 
= >» br vikz Zp. (42) 
ik 


This is distributed as the ratio of two independent sums of squares with q and n—q-1 
degrees of freedom, so may be tested directly by means of the F-distribution. We have 


pa ce 3 y vile, zy. (43) 
7 


This provides an overall test of the assumption of proportionality and of the specified 
value of the constant. This may be partitioned to give tests separately of the two aspects. 








0. In 
ssion 


(40) 


stant 
rious 


be of 
luces 
from 


does 
od is 
pro- 
> test 
cant, 
ance 


Mj Pj, 


may 


he q 


(41) 


rides 


(42) 


ified 


cts. 








E. J. WILLiaMs 109 


Now the quantities bj, or the equivalent p’,, can be shown to represent the variation among 
the restricted regressions. Also, since 


DP; = Di—NY;(§ —Z) (44) 


we see that the expected value of the correlation of z; and pj; between lines is zero. This is 
an expression of the fact that the p’; account for all, and the z; for none, of the variation 
between lines. 

Hence, the sample regression of the z; on the p/ provides atest criterion for the hypothetical 
value of &. 

The sum of squares for regression is 


ntl d v*z;p;.)" 
jk 


Gewaikn’'n’’ 45 
t x Le" 5Pi ( ) 
j k 


This is distributed as the ratio of two independent sums of squares with | and n—q—1 
degrees of freedom, and thus may be tested by the F distribution. 


(n— q—1) mtd oie, Pal? 
sting TEEe in 





(46) 


The sum of squares, with g—1 and n—q-—1 degrees of freedom, for departures of the z; 
from regression on the p;, which is given by the difference between (42) and (45), is available 
for testing departure from proportionality. 

It is convenient to express these sums of squares and products in a form which shows 
explicitly their dependence on &. If we write 


J =n >> ~ OF Ups 
j k 


n dial 
SDV VF; Vy.» 
‘7's 


_n 


~ 2 ~ x OED Pres 


then the total sum of squares (42) of the z; is 
FJ +2E-2) K+ (E-2)*D} (47) 


while the sum of squares (45) for regression of z; on p; is 


nt[(§ — 2)? K — (—2) {(t/n) L—J}— (t/n) KP 


t'[n(E—%)? J — 2(E —%) tK + (#?/n) L] (48) 





The sum of squares for departure from proportionality is found, by subtraction, to be 


tt’ JL — K?] 
n[n(——z)2J — 2(€—2)tK + (P[n) L] 





(49) 








110 Simultaneous regression equations in experimentation 


The full analysis may be set up in the form of an analysis of variance, as follows: 











Degrees 
of freedom Sum of squares 
Constant of proportionality 1 Oe = ae ae ) KP 
Departure from proportionality q—1 nin(€—=P pares +@)n) Ly} 
Error n—q-1 1 
Total n—1 1 + (nt/t’) = x vikz 52, 


= | U4, + (nt/t’) 2,2, |/| Vj |. 
Provided the departure from proportionality is not significant, the analysis enables us 
to test a specified value of £, or, more generally, to set fiducial limits. An estimate of the 
constant of proportionality is one of the two values of £ which make the sum of squares for 
constant of proportionality vanish. Now one of these values minimizes the sum of squares 
for departure from proportionality, while the other maximizes it. The former value is the 
appropriate estimate. It will be noted that, considered as a function of £, the sum of squares 
for departure from proportionality is inversely proportional to 
1 me - # 
3 mez - O(E—z)tK + =2| (50) 
which is the sum of squares for differences among the p‘, or for difference of regressions. 
Thus the optimum estimate of ¢ is that which minimizes departure from proportionality 
and maximizes difference of (proportional) regressions. 
If X is the estimate of £, the equation for X is 


(Xz)? K—(X-2) (,-7)-K« n't 


_ 54 CmL—S t Villt/n) L + J}? — (4t/n) (JL — K°)] 


giving xX aK 





(51) 
The two roots lie on opposite sides of Z. 


In other contexts, an analysis similar to this one will provide a test for the constancy of 
a set of ratios, and fiducial limits for their common expected value. 


REFERENCES 


Box, G. E. P. & Hunter, J. 8. (1954). A confidence region for the solution of a set of simultaneous 
equations with an application to experimental design. Biometrika, 41, 190-9. 

CouEN, W. E. & Mackney, A. W. (1951). Influence of wood extractives on soda and sulphate pulping. 
Proc. Aust. Pulp. Paper Ind. Tech. Assoc. 5, 315-35. 

FisHer, Hans, Hansen, R. G. & Norton, H. W. (1955). Quantitative determination of glucose and 
galactose. Analyt. Chem. 27, 857-9. 

VAN DER WAERDEN, B. L. (1931). Moderne Algebra. Berlin: Springer. 





So =o 4 8S So BB 





KP 


51) 


ind 





[ 111 ] 


ONE-WAY VARIANCES IN A TWO-WAY CLASSIFICATION 


By THOMAS 8. RUSSELL anp RALPH ALLAN BRADLEY* 
Virginia Agricultural Experiment Station of the Virginia Polytechnic Institute 


1. INTRODUCTION 


This research is concerned with the estimation of error variances in a non-replicated two- 
way classification and with inferences based on the estimators derived. The results are of 
interest in a wide class of applications. In general, the procedures developed may be used 
in checking the assumption of homogeneous error variances in randomized block designs 
under certain conditions and, in particular, use is seen in comparing the precisions of 
analytical methods in quantitative experimentation and the consistencies of judges in 
subjective experimentation. Someof the results resemble those obtained by Bartlett (1937) 
who considered the comparison of estimates of variance from independent samples. 

Papers by Grubbs (1948) and Ehrenberg (1950) relate to our problem. We shall comment 
on these papers in turn. 

Grubbs was interested in measuring the burning time of a powder-train fuse through the 
use of several timing instruments attached to a rifle. He required estimates of both powder 
variability and instrument variability. The model assumed was that 


Yip = Mite; ((=1,....n37 =1,...,7), (1-1) 
where, for example, 

y;; is the observation on the ith fuse by the jth instrument, 

pf, is the effect of the ith fuse, and 

¢;; is the error in measurement of the ith fuse by the jth instrument. 

Both yu, and ¢;; were assumed to be normally and independently distributed, ~; with 
mean jt and variance o%,, €;; with mean zero and variance oj. Grubbs used two estimation 
procedures that in fact yielded identical estimators for a3. It can be shown that his esti- 
mators and those obtained later by Ehrenberg are the same as those obtained in this paper. 
Grubbs suggests that a certain function of his estimator and oj has approximately a 
y*-distribution under certain conditions and states that other rough test procedures may 
be used. We shall comment further on these test situations in connection with tests that we 
develop. 

Ehrenberg essentially obtained three different estimators of 04, although his model was 
somewhat different from that of Grubbs and is identical with the one that we shall set forth 
in the following section. One estimator was derived by approximating to the solution of 
a set of equations resulting from application of the method of maximum likelihood and 


* This paper is a revision of Technical Report no. 9 on the first contract referenced below. A Ph.D. 
dissertation based on that report was submitted by T. S. Russell to the Virginia Polytechnic Institute 
in partial fulfilment of the requirements for the degree. Work by T. S. Russell was supported in part by 
U.S. Army Quartermaster Research and Development Command under Contract no. DA 44-109-qm- 
1488. The views and conclusions in this paper are those of the authors and do not necessarily reflect the 
views, or have the indorsement, of the Department of Defence. T. 8. Russell is now with Washington State 
College, Pullman. Contributions of R. A. Bradley were sponsored by the Agricultural Research Service, 
U.S. Department of Agriculture, under a Research and Marketing Act Contract No. 12-14-100-126(20). 











112 One-way variances in a two-way classification 


correcting the approximation for bias. This estimator coincides with the one derived by 
Grubbs and the one that we develop. Ehrenberg obtained the values of the coefficients of 
a general quadratic form in y,; required to yield his first estimator, but we shall use a 
quadratic form estimator in a somewhat different way. Ehrenberg’s second estimator was 
unbiased but had a variance dependent upon values y;. His third estimator depended on 
the use of ranges. He noted that it may be presumed that his last estimator is less efficient 
than the first, as is usual in variance estimation by ranges in comparison with variance 
estimation by second sampling moments. 

In this paper the estimator common to Grubbs and Ehrenberg is re-derived using two 
different approaches. Certain tests of hypotheses based on these estimators are developed 
and we shall show that, under certain conditions, the estimator has an exact distribution 
which is that of a linear function of two independent y?-variates. The x?-approximation of 
Grubbs (a similar approximation was suggested by Ehrenberg) may be taken to be an 
approximation to our exact distribution. 


2. MATHEMATICAL MODEL 


For the two-way classification, we assume the model 


Yip = Mi t+hy+e,; (t= 1,....n5 7 =1,...,1), (2-1) 

where 

y;; is the observation in the ith row (on the ¢th item) and jth column (by the jth observer), 

ft, is a parameter representing the mean of the ith row, 

4; is a parameter representing the additional effect of the jth column, and 

€,; is a normal variate with zero mean, ¢,; independent of any other €,,. 
The model differs from the usual one of analysis of variance in that here we take ¢,; to have 
iil V(eé;) = 03 (6 =1,...,n3j =1,...57). (2-2) 


We shall use the restriction, © £; = 0, often employed for determinacy of solution of least 
j 


squares. We could have written ~; = “+7; in terms of a mean effect and an effect of the ith 
row, where })7; = 0 but we did not choose to do so. 
i 


The postulated model may be suitable in many situations. We note two. Consider a 
judging panel of r members scoring n items subjectively. It often happens that replication 
is impossible, for judges remember scores given to items from previous presentation of the 
items. But a measure of the scoring ability of a judge is required. In such situations it is 
unusual to have any objective measure of the ‘true worth’ of an item. It is necessary to 
measure the consistency (and, in a sense, the judging ability) of a judge by comparing his 
scores assigned to items with the averages of scores assigned by all of the judges but allowing 
for a possible constant bias £; on the part of the judge. Ehrenberg was interested in this 
type of problem. The use of subjective scores, usually based on a discrete scoring scale with 
a very limited number of scoring points, is somewhat inconsistent with the assumption of 
normality of the ¢;; which is used in the development of tests, but it is unlikely that such 
departures from normality will seriously alter conclusions based on the tests. 

For a second example, consider n items and r processes. We might perhaps have n 
samples for chemical analysis and r methods of analysis. Now £; may be a bias due to the 
jth method of analysis. ~; would be the true mean of the ith sample and o would represent 





eX] 


th 





Tuomas 8S. RussELL AND RALPH ALLAN BRADLEY 113 


the precision of the jth method of analysis. Grubbs assumed that /; = 0 and that ~; was 
a random variable; however, he was interested in examples like this depending upon 
objective measurement. 

Returning to the model, we introduce a matrix notation that will be useful later. Let Y 
be a column vector of nr elements y,; or of r column vectors Y;, each with n elements y;;. 
The transpose of Y is 


Y’ = (Yj, .... Fp (2-3) 

and Yi = (Yayo --+9Yng) (9 = 1,--->7). (24) 

We define B’ = (gy, :..; Kas By; ---8,) (2-5) 

and A’ = (A,,..., A,), (2-6) 
an (n+7r) by mr matrix expressed in terms of (n+7) by n submatrices. We write 

A; = [ay] [6 =1,...,0; p = I,...,(n+1)], (2-7) 

where Q;;,=1 when p=i or (n+)), (2:8) 


=0 otherwise. 
It is now apparent that Y = AB+é, (2-9) 
where £ is the vector of variates ¢€,; in positions corresponding tothe elementsof Y. (2-9) may 
be taken as a representation of sample observations in terms of the model (2-1). If we take 
expectations, we have E(Y) = AB (2:10) 


3. MAXIMUM-LIKELIHOOD ESTIMATION 
(i) A direct application 


The straightforward application of the method of maximum likelihood for estimation of 
the parameters in B and the o leads into difficulty. The likelihood function is 


f(Y) = (2a) (T] 09) exp[- XE Yijs— 4 — B;)?/2079). (3-1) 
I v7 


The normal equations, resulting from the process of maximizing or minimizing f(Y) with 
respect to the parameters, yield solutions for #;; and £;, but with 7; dependent upon 64, and 
reduction of the normal equations yields 


r 
= (Yig im y./F% 
ee ee i eimai OT (32) 
i > 1/6; 

q=1 





in the attempt to evaluate 63. This is a procedure also considered by Ehrenberg.* 
Iterative solutions of (3-2) converge yielding a value zero for one 69. The other estimates 


become - : : 
BF = LX (Yis—Y.5- Yeo Y.vV/M (GFP, J = 1-7), (3:3) 
4 
* Ehrenberg also noted that a simple but inconsistent solution to the normal equations related to 
f(Y) exists in the special case with r = 2. We leave the reader to refer to his paper for a discussion of this 
special case. 
8 Biom. 45 








114 One-way variances in a two-way classification 
since kim [> (Yis—¥. MFMVE 1/65 = Yip—Y.p- (3-4) 


Actually if any one 67 is zero, a solution to (3-2) in the form (3-3) for the remaining 67 exists. 
But the evaluation of (3-1) with these solutions minimizes that function for the evaluation 
of (3-1) essentially depends on 


lim A exp : 0. 


0307p o, “i 
(ii) An indirect application 
We have been led into difficulties in our attempt to use maximum likelihood to obtain 
estimators of the a7 in the direct application of that method to the likelihood function f(Y) 
in (3-1). Now, if each of = o?, o? would be estimated in the usual way of the analysis of 
variance for a two-way classification. That estimator of 7? depends on (n— 1) (7 —1) linear 
contrasts formed from the original observations. These contrasts have zero means and do 
not depend on the y,; and £; whether or not oj = o? (j = 1,...,r). The estimator of o? of the 
analysis of variance is not the maximum-likelihood estimator obtainable from (3-1) with 
o} =o". But it is the maximum-likelihood estimator with reference to the likelihood 
function of any set of (n—1)(r—1) linear, and linearly independent, error contrasts. This 
suggests that difficulties in estimation based on (3-1), which stem from the simultaneous 
estimation of the ,, £;, and oj, may be circumvented if we first transform our data and then 
restrict our attention to a set of (n—1)(r—1) error contrasts for the estimation of the o7. 
We now proceed in this way. 
We use a set of (n—1)(r—1) error contrasts that have a reasonably simple variance- 
covariance matrix. Let Z be a column vector of (n — 1) (r— 1) new variates defined by 


Z = OY. (3:5) 


C, the matrix of the transformation, has (n — 1) (r—1) rows and nr columns defined by the 
‘direct-product’ of matrices* D,, and (—D,), 


C = D,,x(—D,). (3-6) 


The matrix D,, has (n—1) rows and n columns and is 


1 -l O cee 0 
1 0-1 0 

es i : ese i (3:7) 
1 0 Ct. ys 














D, has the same form with (r—1) rows and r columns. The elements of Z form one of the 
possible sets of (n—1)(r—1) linearly and stochastically independent contrasts associated 
with the error variance of the analysis of variance for the two-way classification when 
oF = o? (j = 1,...,7r). It is clear, even when oF+0, that 


CA=0 (3-8) 


* The direct-product notation is as used by MacDuffee (1946) and is the same as the Kronecker pro- 
duct used by Vartak (1955). The notation here implies that each element in — D, acts as a scalar multi- 
plying D,; every element in —D, is replaced by the product of the element and D,. Then C has 
(n—1)(r—1) rows and nr columns as indicated above. 





and 
in vi 
Th 


whe! 


and 
plice 


whe 


The 


an 


3:4) 


sts. 
jion 





Tuomas 8S. RuSSELL AND RALPH ALLAN BRADLEY 
and E(Z) =0 


in view of (2-9). 
The variance-covariance matrix &, of the new variates in Z is 





E, = 02,0", 
|o% O 0 | 
where eS) nd id 
| gratermnsel 





115 
(3-9) 


(3-10) 


(3-11) 


and J, is the n by n identity matrix. An alternate form for &,, apparent when the multi- 


plication is effected in (3-10), is 
x, = (D,,D,,) x H, 


where H is (r—1)-square and written 


2 2 
| O{+O5 OF oy; =C 
2 24 G2 2 
H=-| % of+o% o 
| PTTTTTTTITTIT TIT TTT eT TTT | 
| 
| 2 2 
| of of o%+07 | 


The joint density function of the new variates is 


f(Z) = (20)-¥m—Ye— | EF |hexp (—42'E; 12). 

















It follows that 25? = (D,,D,)* x A 
| (m—1) -1 -1 | 
1 || -1 = -1 
in view of (3:12) (D,, Di.) = — | less 
rr (n—1) 
| co}-of —oF —o} | 
eae cic? | 
1 | —o2 coj—o% ib —o} 
and Hi=~| oot ~ of aot | 
RELATED yk PRA RS AT BE ! 
a....28 COF— OF | 
oc? o3c? - 6h 
with c= of > 1/04. 
¥] 


(3-12) 


(3°13) 


(3-14) 


(3-15) 


(3-16) 


(3-17) 


(3-18) 


It is easy to verify that (3-16) and (3-17) are correct. (D,, D;,)-1 is an (nm — 1)-square matrix. 


We shall maximize ZL = Inf(Z) from (3-14) with respect to ;, where 


w; = 05/| H| 


(3-19) 


and we use ‘In’ to denote ‘natural logarithm’. When maximum-likelihood estimators of 


w, are available, (3-19) is solved for the estimators of a3. Now 


Re ae ae ee 
be —42Z' —=;1Z. 
dw, 2|2;1| dw, 12 Ow, * 





(3-20) 


8-2 








116 One-way variances in a two-way classification 


‘— 


Let Soin = dw, [s, u =1,...,(m—1)(r—1)], (3-21) 


where o*“ is the (s, u)-element of X51. It follows that the first term of the right-hand member 
of (3-20) becomes BDV you Fou 
8s u 


and the second term becomes —4 > DSieu%%u: 
su 
Then, when 0L/0w; is equated to zero, we have 
UV fisul%su—*5%u) = 0 (j = 1,...,£). (3-22) 
Solution of (3-22) yields the maximum-likelihood estimators 63 of of. Note that 


: a 
|| Fieu |] =(D, Dy) x 30, H (3-23) 


from (3-15) and because (D,, D/) is a matrix of constants. 
We have been able to solve the equations (3-22) when r = 3. Then the equations reduce to 


2(n—1) 03 = Z’ || fosut+Sesu—Sisu || 2, (3-24) 
2(n—1) 03 = Z' || fisut+Seeu—Sesu || 2, (3-25) 
and 2(n—1)02 = Z’ || feu +Sosu —Sasu || Z- (3-26) 


We shall not solve these equations here but delay solution until the next section, where the 
solutions will be compared with those obtained by another method. When r > 3, solution of 
equations (3-22) involves the solution of simultaneous polynomial equations in the oj. 
When r = 2, solution of the two equations represented by (3-22), is impossible; then the 
equations become identical except for a constant multiplier and a solution for (0? +03) 
only is possible.* 

4. QUADRATIC FORM ESTIMATION 


A quadratic form in the original observations y;; would be a desirable form for an estimator 
of of (t = 1,...,r). Consider the general form 


° 


ie BB Misnil) YisYne (i,h = :, ee = 1, Ren % (4-1) 


Q, is determined when values of m,;,,,(¢) are determined. This is done by imposing reasonable 
restrictions of symmetry on the m,,,,(t) and by requiring} that Z(Q,) = o7 independent 
of w; and £;. 


* That no solution is possible when r = 2 is also apparent from the special form of (3-14) which now 
depends only on (o? +03). It is also well known that there the two-way classification reduces to paired 
observations and error contrasts depend only on differences between paired observations. These differ- 
ences have variances (o? + 03) under our model. Ehrenberg noted that one of his methods failed in this 
case and proposed a second procedure that is workable here but that has other disadvantages. The 
equality of two variances may be tested by testing the correlation of the sums and differences of paired 
observations against zero. 

+ Ehrenberg’s second estimator is unbiased but has a variance, and consequently a distribution, 
depending on the y,. This may not be a disadvantage in some cases, but, if variance estimators from 
several experiments are to be compared, the disadvantage may be quite real. The estimators obtained 
here will not have distributions depending on the ji, and £;. 








an 


wit 


21) 


ber 


22) 


23) 


to 
24) 
25) 
26) 


he 
of 


ho 


he 


72) 





Tuomas 8S. RussELL AND RautpH ALLAN BRADLEY 117 


Conditions to be satisfied by Q, are 
(i) Q, must be invariant under interchange of order of items. 

(ii) Q, must be independent of the parameters yu, and /;. 

(iii) Q, must be an unbiased estimator of o?. 

We think of M, the matrix of the form (4-1) as an r by r matrix of matrix elements M,, 
which are themselves n by n matrices with elements m,,,,, in the (¢,h)-position of M;,. 
We drop the argument ¢ with M and m,;;,, for simplicity except where this may lead to 
ambiguity. 

Condition (i) implies that Mi six = Magne (4-2) 


These equations mean that M;,, is symmetric with equal diagonal elements. Condition (ii) 
requires that @, should not depend on ; and /; and this essentially means that Q, is a func- 
tion of the ¢;; of (2-1). Condition (ii) is satisfied if 


UMisny = O (4-4) 
j 

and dX Mink = 0. (4:5) 
i 


Equation (4-4) states that the sum of corresponding elements from each M;, (or M,,;) in any 
column (or row) of M is zero and equation (4-5) states that the sum of the elements in any 
row (or column) of M;, is zero. In order that L(Q,) = o7 as required by (iii), it is necessary that 


and XU Misi; =0 (j+#). (4:7) 
v 

Equations (4-2) through (4-7) are not sufficient to determine M uniquely when r > 2, but 
are inconsistent* when r = 2. Additional assumptions of symmetry are imposed when r > 2 
and the following discussion is limited to the case with r > 2. The elements m,;,;, of M may 
be regarded as weights assigned to observation products y;;¥;,;, in Q,. Differences in weights 
should depend on the variances of the observations in these products; usually no @ priori 
knowledge of these weights will be available, and in addition we shall be concerned with 
tests of null hypotheses postulating equal variances and hence equal weights. We then 
essentially regard it to be reasonable to assume equal variances 07 (j +#), leading to further 
definition of the elements of M. The weighting system used in the estimation of of and in 
the determination of M requires that 


My = My (p.9+t; p.g = 1,..-7); (4:8) 

Myg= My  (P#O,U+V; P,7,U,VFE; Pg, U,v = 1,...,7), (4-9) 

Mp = My  (0,9¥t5 9,9 =1,..457), (4-10) 

and Mink = Mj, (t+h, f+g; f,g,t,h = 1,....n; j,k =1,...,7). (4-11) 
Using all of the relations, (4-2) to (4:11), we obtain 

n(n—1)(r—1)(r—2) My = (r—1) (r—2) M, (4:12) 


* The inconsistency of the equations when r = 2 indicates the impossibility of obtaining estimators 


with the required properties in this special case. 











118 One-way variances in a two-way classification 


My =9 (q+), (4:13) 
n(n—1)(r—1)(r—2) My = —(r—-2)M, My=M, (q+?) (4-14) 
and n(n—1)(r—1)(r—2)M,,=M (p,q+t,p +9), (4:15) 


where J is a matrix with diagonal and non-diagonal elements identical to those of n(D,, D/)= 
defined in (3-16) except that M/ is an n-square matrix. 

We shall now show that the estimators obtained in this section are equivalent to those 
obtained from maximum likelihood on the contrasts of Z in §3 in the case where n > 2 and 
r = 3. The equivalence does not follow when r >3. We shall demonstrate that ¢?=Q, and 
the remaining two similar identities follow from symmetry. 67 is defined through (3-24) 
in terms of the elements of Z. Rewriting (3-24) in view of (3-5), we have 


2(n—1) 6% = Y'C’ || foou+Soeu—Sreu | CY: (4-16) 


Also, in view of (4-12) to (4-15) and (4-1), in the special case with ¢ = 1, r = 3, n>2, 


S «af ai 
2(n—1)Q, =— YM x ich. 9 : Y. (4:17) 
<a 











Evaluation of (4:16) follows from (3-/!3), (3-17) and (3-13). When r = 3, 





























1 o?+o023 -o 
= ae 1+ 9% 1 : 
Bm | Ne otset eo 
7) 1 -1]| 7) 0 0} C) 1 0 
~—-H-o= os SAO i See ; ‘3 
re = eek Oe | 01 30,” | 0 0 | sae 
Th ra, | 1 
™ || Fosu +Fseu —Fisu | = (Dp Dz) x 110 
(4:20) 
. 0 (D,D;,)~ 
“| (DDO 
Now C = D,, x (—D,) as given in (3-6) and simple matrix algebra yields 
2 -1 -1l 
CO" || fosut+Sseu—Sisul| C = Dj(D,Di)"D,x || -1 0 1 (4-21) 
-1 1 0 














It is again a matter of matrix multiplication based on (3-7) and (3-16) to show that 
Dy(DyDy)* Dy = Mn; 


this is all that is necessary to show that the matrices of the quadratic forms in the right-hand 
members of (4-16) and (4-17) are identical. We shall use the result that 6? = Q, when n > 2, 
r = 3 in the next section. 

The form of Q, may be simplified through some extremely cumbersome algebra. We shall 
simply sketch this reduction and note that further details are given by Russell (1956). 





17) 


18) 


19) 


20) 





Tuomas 8. RussELL AND RALPH ALLAN BRADLEY 119 
M, the matrix of Q,, has been defined in (4-12) to (4-15). We now consider the matrix form, 
n(n—1)(r—1)(r—2)Q, = Y'pY (4-22) 


and n(n—1)(r—1)(r—2)M =p. We shall let ,; represent the (7, j)th column vector of 
pand “,; contains mr elements. To begin the simplification of Q,, it is shown that 


Y' ng = 7(Y%;:..-Y..) —MYy—Y.9) —MUr—V) (Ya-Ya) (98), (4:23) 
and Y'py = nr(r —2) (Yu—Y a) — (7 — 2) (Y:.-Y..)- (4:24) 


The notation is conventional 
l 1 1 
¥i.= 7 win 45= nvie y= mr is (4-25) 


Now (4:23) and (4-24) define the nr elements of the row vector Y’u and vector multiplication 
(Y’n) Y yields a result which may be reduced for the final form, 


(n— 1) (r—1) (r—2) Q, = r(7- 1) (Ya Yi. Yet Y..P— LE Yig— H.-Y. ¥..P 
; (4-26) 


Q, in (4:26) is identical with the estimators recommended by Grubbs and Ehrenberg and is 
now expressed in the form that they used. 

Note that Q, may be negative. If this causes difficulty, it is an interpretative one similar 
to that experienced in the estimation of variance components. In the special case with 
r = 3, it can be shown that only one Q, may be negative; it may be seen from (5-4) and (5-5) 
in the following section that the sum of any two Q’s must be non-negative. A general rule 
when r > 3 is not evident. 

To conclude this section, we note a result obtained by both Grubbs and Ehrenberg. They 
found the variance of Q, 


4 
V(Q,) = of+ ———., at z (aap y LIM (4:27) 


2 
(n—1) '* (n—1) = 1) 
j,h+t 
This result may be used to obtain a Pate to the distribution of Q,/o? when 
= 07 (j = 1,...,r), as discussed in th» next section. However, there, Q, is expressed in 
a different form and its moments may be obtained more directly. 


5. DiIstRIBUTION THEORY AND TESTS OF SIGNIFICANCE 


In this section we shall consider the nature of the distribution of Q, and propose several 
tests of significance for hypotheses of assumed equality of the variances oj. To facilitate 
this work, it is desirable to express Q, in terms of (n— 1) (r— 1) new variables instead of the 
original nr observations shown in (4-26). We shall consider this latter problem first and only 
indicate the algebraic procedure. 

Consider the transformation 


Y=WY (5-1) 


of the nr observations in the column vector Y and take W to be orthogonal. There are many 
matrices W that would suffice, but we take W as described below. Let I), be the orthogonal 








120 One-way variances in a two-way classification 


Helmert matrix of n rows and n columns with the row of identical elements placed in the 
last or nth row position. The matrix 


r=atx>P (5-2) 


isorthogonal. W isthe matrix obtained by placing the jnth row of lin the[(m — 1)(r—1)+j]th 
position of W [j = 1,...,(r—1)], while the remaining (n — 1) (r—1) of the first n(r— 1) rows 
of T' are retained in order in the first (n — 1) (r—1) row positions of W. The last n rows of W 
are identical to the corresponding rows of I’. The inverse transformation to (5-1) is required 
for substitution of the new variables in the form (4-26) for Q,. Then, 


Y = W’Y. (5-3) 


Substitution in (4-26), taking advantage of the orthogonality of W’, lets us write Q, in terms 
of the first (n—1)(r—1) new variables, ¥,, ...,%_»¢—»- (We are using a single subscript 
notation for the new variables.) The algebraic details are given by Russell (1956) and we 
simply note here that 


(n—1) <n 
(n= r—) r=) Q= H—)"S | [ie on-vew 


(r—2) 2  (n—1)(r—1) 





= Yain—D+p _ ‘ = 7 a 54 
q=¢—-w Vi(9 + 1) (¢+ 2)} u=1 Yur (4) 
where ¢ = 1,...,r and, when ¢t = r, the second term in parentheses is defined to be zero. 
When ¢ = r, we have the simpler form, 
(n—1) : (n—1) (r—2) ™ 
(n— 1) (r—1) (r—2)Q, = r(r—2) 1 Pr—2in—D+-p — = Yu (5:5) 
p= u= 


The asymmetry of forms of Q, for various values of t is introduced by the transformation 
(5-1); rearrangement of the rows of W would permit showing any specified Q, in a form like 
that for Q, in (5-5). It follows from the form of W that %,, ...,%,—1q—» have zero means, 


are normal, and are independent with equal variances o? if o3 = o? (j = 1,...,7). 
Test 1 
Consider the given conditions: 
Conditions 1: oj=o* [j=1,...,(r—1)], (5:6) 
o? known, 
and the null hypothesis, H,: of =o*. (5-7) 
The alternative hypothesis is Hy: o%+ 07 (5:8) 


but with Conditions 1 retained. We compare Q, with o? and we need the distribution of 
Q,/o? under Conditions 1 and H). Now ¥,/¢, ..., Jn—1)—y/o are independent standard normal 
variates and the right-hand member of (5-5) depends on two independent sums of squares. 
If we divide both sides of (5-5) by o?, we note that 


(n—1) (r—1) (r— 2) Q,]0? = r(r — 2) Xn-a — Xin—wr-w» (5-9) 


where the y*’s have the indicated degrees of freedom and are independent. 
The distribution of the difference of two independent x?-variates has been studied by 








the 


ipt 





Tuomas 8. RussELL AND RALPH ALLAN BRADLEY 121 


Pachares (1953) and Gurland (1955). Pachares has shown that an indefinite quadratic form 
has a density function that can be expressed in terms of Bessel functions; Gurland used 
a finite series of Laguerre polynomials. In the special case in which n = r = 3, it is easy to 
see that the density function of g = Q,/0? is 


fq) = e-44/2 (q>0) 
=e4/2 (q<0). 


An examination of the moments of Q,/o? suggests that its distribution may be approxi- 
mated by taking CQ,/o? to have a y?-distribution with C degrees of freedom with 


1) (r= 1) (r= 2) 


a oe (510) 





Grubbs essentially used this approximation with r = 3, but the goodness of the approxima- 
tion is not known. It does not seem necessary to attempt to table the exact distribution of 
Q,/o? in view of the results of Test 2 below. 

Note that symmetry does exist and our discussion here applies not only to Q, but to any 
Q; selected from values j = 1, ...,r. However, correlations between the Q’s do exist. 


Test 2 
Let us again consider the situation of Test 1 except we now have 
Conditions 2: oj=o? [j = 1,...,(r—1)], (5-11) 
o? unknown. 
It is now essentially necessary to compare Q, with an estimate of o? and in fact an exact 


test results. 
Let the error sum of squares of the analysis of variance for the two-way classification be 


(n—1) (r—1) 
E=DVUY5—-¥i.-Y.5t¥..P = a Tu (5-12) 
ij u= 


Under Conditions 2 and H,, # has expectation, (n—1)(r—1)o?. Division of both sides of 
(5-5) by # lets us write 


1? F 
(n—1)(r—1) (r—2)Q,/E = SHS FES ae er 
(n—1) 
Ps y *r—-2(n—D+p/(m— 1) 
where F= = eS ge ae as (5-14) 
ya)(n 1)(r — 2) 


“= 
and, under Conditions 2 and H,, Z has the usual variance-ratio distribution with (nm — 1) and 
(n—1)(r—2) degrees of freedom. (5-13) follows easily from (5-5) when it is noted that Z 
may be expressed as the sum of the two sums of squares in (5-14). Through the association 
of (5:5) and (4:26), Z may be expressed in terms of the original observations as 





_ _ r(r—2)G, ; 
= G—1E—rG,’ ain 
where G, = X(Yir—Yi.-Y or FY...) (5-16) 
i 








122 One-way variances in a two-way classification 


Now Q,/E is a monotone increasing function of F and we use F as the test statistic rather 
than a multiple of Q,/Z. 


We may test H, under Conditions 2 against one-sided or two-sided alternatives, 


H,,: of>o%, (5-17) 
Hy: o<o%, (5-18) 
Hy: a+o%. (5-19) 


If the test has significance level « and if F,(v,,v,) denotes the tabular value of F with p, 
and v, degrees of freedom such that PLF > F,,(v,, v2)] = a, the rules of rejection corresponding 
respectively to the alternative hypotheses (5-17) to (5-19) are reject H, if 

(i) F>F,{(n—1),(n—1)(r—2)}, 

(ii) F<1/F{(n—1)(r—2), (n—1)] 
and (iii) F>F,[(m—1),(n—-1)(r—2)] or F<1/F,,[(n—1) (r—2), (n—1)]. 
Again, some other @; may be substituted for Q, in the results for Test 2 and the results hold. 

Tests 1 and 2 appear to be useful when one new analytical method is introduced for 

comparison with established methods that are known to have homogeneous variances. 
Similarly, a new judge may be introduced on a panel of trained and consistent judges. 


Test 3 
We should like a test of homogeneity of variances, a test of the null hypothesis, 


Hy: of=o* (j =1.,...,1), 
against the general alternative hypothesis, 
H,;: o7+0" (for at least some )j). 


We shall assume a? unknown. Extreme difficulties were encountered in developing exact 
distribution theory for this test, even in the case with n = r = 3. Instead we have developed 
a test based on large-sample likelihood ratio theory, but only for the case in which r = 3. 

We have shown in § 4 that Q; is identical to 67 (j = 1,...,3), when r = 3. G3 is the maxi- 
mum-likelihood estimator of 7 in reference to the likelihood function of (3-14). The log- 
arithm L of the likelihood function is defined following (3-18). We require L(Q), the maxi- 
mum of Z under H,;, obtained by substituting Q; for oj in L, and L(w) obtained under H, 
by substituting the maximum-likelihood estimator of o? for each oj in L. We now proceed 
to derive L(Q) and L(w). 

To evaluate L(Q), we essentially need to evaluate | $; >1| and — 42’3; >1Z, where the carat 
indicates that Q; is substituted for oj (j = 1, 2,3), in 21. Now ZF! = (D, D}) x H- and, 
to evaluate 7", we need A- since (D,, Dj) is a matrix of known constants. When r = 3, 
H-' is given in (4-18), the estimators are defined in (3-24) to (3-26) in terms of matrices 
|| few ||, these matrices are shown in (3-23), and definition of the matrices is complete with 
the derivatives of (4-19). | H| follows from (3-13). A series of substitutions based on the 
results referenced immediately above leads to the new results, 


A> =[|A|(-1)p | (5-20) 





Z(DyDiy1Z_— —2'(D,Dy)*Z 
-Z(D,Di7°Z BD, Di) | 


and | Z| = (m-1)*{Z'(D,, Dj) Z][Z'(D,, Dy) Z) - [Z"(D,, Dj) 7 ZP}, (5-21) 


fre 








c rather 





Tuomas S. RuSSELL 4p Rate ALLAN BRADLEY 123 


where Z’ and Z’ are (n—1)-element row vectors consisting respectively of the first (n—1) 
and remaining (n—1) elements of the 2(n--1)-element row vector Z’. The matrix multi- 
plication, [Z’(D,, D;,)“! x H-1Z], permits us to write 


~42'S71Z = —(n-1) (5-22) 
in view of (5-20) and (5-21). By the rules of direct-product multiplication of matrices,* 
we have | $-1 | = |(D, Di) x A= | = |(D, Di) |? | Ana |e) = n-| A= |m—D, (5-28) 


the latter form following since | (D,,D;,)~1| = 1/n, a result that may be developed from (3-16). 
We also require \A | in terms of the estimators instead of the form (5-21) above; then, 
from (4-18), it follows that = 

|| = Q,0.+ 0,93 + 929s. (5-24) 


(5-23) and (5-24) permit evaluation of to 1|, Substitution in L of the results of (5-22), 
(5:23), and (5-24) yields the final form 


L(Q) = —#{(n— 1) (r—1)} n 2a—-Inn— }(n— 1) In (Q, 9. + O13 + Q2Q3) —(n—1). (5-25) 


We now evaluate L(w). In this case with o? = 03 = 03 = o?, we have 








giileuy a ae tee r 
=; to 32 (PnPn) 1x | ~ 2 (5-26) 
and | DP] = n-23-"—-D g—ten—) (5-27) 
from (3°15) and (4°18), and 
1 2 -1 
ae ISU a | : 
4Z'X71Z 5537 |(.s) x} 4 9 ||2 (5-28) 
1 >, 
“ — 537 || Frew + fosu t+Feeu || 2; (5-29) 
-—l 
= -") (Q1+Q2+ 4s). (5-30) 


(5-29) follows from the definitions (3-23) and the derivatives (4-19); (5-30) follows from (3-24) 
to (3-26) and the equivalence of Q; and oj (j = 1, 2,3), when r = 3. Then L becomes 


Ly = —{(n—-1) (r—1)} In 27 —Inn— 3(n— 1) In3—(n—-1) 0? 


V4 re (0, +92 +s). (5°31) 


The maximum likelihood estimator of o? obtained from Ly is 
6? = $(Q1+Q2+ Qs) (5+32) 
and substitution in L, leads us to 
Li) = —}{(n—-1) (r—1)} mn 27—-Inn— }(n— 1) In3—(n—1) 
—(n—1)In }(Q, + Q2 + Qs). (5-33) 
Note further that 3(Q1+ Q2+Qs) = H/2(n—1) (5-34) 
* See MacDuffee (1946), p. 82. 








124 One-way variances in a two-way classification 
from (4-26) and (5-12). With substitution of (5-34) in (5-33), we finally obtain 
~2IndA = —2[L(w) — L(Q)] 
= —(n—1) [In (Q,Q2.+ Q193 + Q2Q3) —2InH+2In(n—1)+In$]. (5-35) 


An approximate test of H, against H,, is obtained by taking — 21n A to have a y?-distribu- 
tion with 2p.¥F. This test becomes approximately correct for large values of n. The com- 
putation of — 21n A causes little difficulty when values of Q,, Q, and Q; are known. We know 
that values of Q@; may sometimes be negative. However (Q,Q.+@Q,Q3+Q2Qs) is positive 
with probability one. To show this, we write 


Q1 92+ Q193 + G29s = (Q1+ Qe) (Q1 + Qs) — 9? (5°36) 


and substitute in the right-hand member of (5-36) using the forms (5-4) with r = 3. Then, 
we need to show that 


(n—1) _(n—1) ((n—1) 2 
>» Orr >> Yoxn—v = ( >» Valles») ° 
g=1 g=1 g=1 

But this is Cauchy’s inequality, true for all values of the variables except when 9,/¥,..(,_1 
is a constant for all g = 1, ...,(m—1). The exception is an event that occurs with probability 
zero and yields an equality instead of the required inequality. 


6. COMPUTATIONAL TECHNIQUES AND A NUMERICAL EXAMPLE 


We illustrate the computations required to apply the methods here developed through 
consideration of an example. The example below is based on chemical determinations; a 
second example is given in detail by Bradley (1958) and is based on the subjective scoring 
of dried veneer as described by Kauman, Gottstein & Lantican (1956). 

In the distilling industry, ground grains, usually corn and rye with about 1% of malt, 
are slurried with water and cooked with constant agitation to 100°C. This gelatinizes the 
starch. The term ‘mash’ is applied to this cooked grain slurry. The mash is cooled to 62°C. 
and additional malt meal added. The malt converts the starch to sugars. After 20-30 min. 
at 62°C., the converted mash is cooled to room temperature and pumped through pipe 
coolers to the fermentors. Several such cooks are required for each fermentor. A yeast mash 
containing actively growing yeast is added and the mixture diluted to a definite volume in 
the fermentor. The latter is then said to be ‘set’. At the end of 72 or 96 hr., a small sample is 
taken from the fermentor and analysed for alcohol. The alcohol yield is expressed in terms 
of proof gallons per bushel (56 lb.) of grain. A proof gallon is 1 gal. containing 50% alcohol 
by volume. The data in Table 1 are the yields, suitably coded, from each of three fermentors 
set each day for 38 days in a distillery. It is desired to investigate the variabilities of the 
fermentor yields and day-to-day variability indicates that it is necessary to consider the 
data in the two-way classification. 

We first compute the estimates of variability, Q,, Q, and Q, for the three fermentors. 
The row and column means y; and y_; are shown on the borders of Table 1; row and column 
totals would normally also appear but have been omitted here to save space. In Table 2 
we show the residuals (y;;—y;,—y_;+y,,) required in the computation. Row and column 
totals in Table 2 should be zero and this is a check on the computation; slight departures 
from these zero totals in Table 2 result from rounding errors. We have sometimes found it 
helpful to compute a table of values of (y; +y,;—y,,) as an intermediate step between 


Table 
colum 
accor’ 
squat 


an 





(5-35) 
tribu- 
com- 
know 
sitive 


(5-36) 
Then, 


1+(n—1) 


bility 


ough 
ns; a 
ering 


nalt, 
s the 
2°C. 
min. 
pipe 
nash 
1e in 
le is 
rms 
ohol 
tors 

the 

the 


ors. 
mn 
le 2 
mn 
res 
1 it 
een 








Tuomas 8S. RuSSELL AND RALPH ALLAN BRADLEY 125 


Tables 1 and 2 but do not show such a table here. The sums of squares of the entries in 
columns of Table 2 are shown at the bottom of that table and designated as G,, G,, G in 
accordance with the definition (5-16). Now, of course, Z = G,+G,+s, the error sum of 
squares of analysis of variance, and, from (5-12), (5-16) and (4-26), we may write 


8 G3) @—e-Ne-H YO be vit 








r 
A further computing check is effected through the comparison of }G; with # from the 
j=1 


analysis of variance, for it is likely that the analysis of variance will have been computed 
in most cases. In our example r = 3 and n = 38. Using (6-1) and G, and # from Table 2, 
we illustrate by noting that 


_ 3(0-032520) 0-081414 = 
@ = —any(iy (87) (2) (1) = O87 








Similar computation leads to the remaining values of Q; in the last row of Table 2. 

The process of charging the fermentors is such that the first fermentor receives less than 
its share of grains and the third fermentor more than its share of grains. The yield of the 
third fermentor should be higher than that of the first and this is substantiated by the data. 
It appeared from experience that the variability in yie!d of the third fermentor was less than 
those for the other two. We apply Test 2 to the sample data to compare fermentor yield 
variabilities. 

Recall that Test 2 supposes that o? = 73 = o? and we test the hypothesis, Hy: 73 = o°, 
against the alternative, H,,: 03 < 0”, of (5-18). We use the F-statistic of (5-15) and the rule 
of rejection (ii). For the example 


3(1) (0-014108) 


i (2) (0081414) — (3) (0-014108) 


0-35, 





obtained through substitution of the results of Table 2 in (5-15). Here F has (n—1) = 37 
and (n—1)(r—2) = 37 p.F. Taking the significance level to be 0-05, we find the tabular 
value of Fi.o5[(m — 1) (r—2), (n—1)] to be 1-72 and 1/F,o; to be 0-58. Clearly the observed 
value of Z is significant at the 0-05 level and actually is very close to the value for the 0-001 
level of significance. The alternative hypothesis H,, is accepted. 

We also apply the approximate Test 3 to the given data to illustrate the use of that test. 
This is the general test of homogeneity of variances given r = 3. Substitution is made 
directly in (5-35) and we obtain, with the use of common logarithms and a conversion factor 
to natural logarithms 


—2InA = — (37) (2:3026) [log {(0-001537) (0-001722) 
+ (0-001537) (0000041) + (0.001722) (0-000041)} 
— 2log 0-081398 + 2 log 37 + log 4/3] 
= 9-83. 
—2In A is taken to have a y*-distribution with 2 p.F. and the observed value lies between the 
tabular values for 0-01 and 0-005 levels of significance. 


Tests 2 and 3 are those of primary importance and they have been demonstrated with the 
example. Test 1 will not usually be useful and has not been discussed in this section. 


7G, 








126 


One-way variances in a two-way classification 
Table 1. Coded yields y;; for three fermentors for 38 days 








—_ 


Means 
Yi. 








Fermentors 
Means 
| e 
l | 2 3 
| 
Sinidtiat nian = ae es 
484 | 498 | 6513 4-983 
4-85 | 5-08 | 5-13 5-020 
4-76 | 4-94 5-03 4-910 
4-79 | 5-06 | 5-13 4-993 
4-74 | 5-03 5-12 4-963 
4-79 4:96 5-08 4943 
4-84 4-95 5-09 4-960 
477 4-91 5-03 4-903 
4-79 4-95 5-05 4-930 
4-90 | 4-90 5-09 4-963 
per iy ae 4-957 
47 | 5-02 | 5-10 4-970 
4-88 5-03 | 513 5-013 
4-84 | 5-00 5-10 4-980 
4-82 | 4-99 | 5-08 4-963 
4-88 | 5-03 5-13 5-013 
4-80 | 4-99 5-06 4-950 
4-90 5-04 5-19 5-043 
490 | 5-02 5-13 5-017 
4-90 | 4-98 5138 5-003 
| 
4-92 | 5-04 5-19 5-050 
4-90 5-07 5-18 5-050 
4°85 5-02 | 5-18 5-017 
4-94 5-03 5-19 5-053 
4-90 4-99 5-15 5-013 
4-90 5-00 5°17 5-023 
4-90 5-04 5-19 5-043 
4:87 5-05 5-21 5-043 
4-82 4-96 5-16 4-980 
4-92 4-97 5-13 5-007 
477 | 4-97 5-03 4-923 
4-85 | 5-02 513 5-000 
4:87 | 4-98 5-19 5-011 
4-87 5-01 5-12 5-000 
4-82 | 4-90 5-10 4-940 
| 
| | 
484 | 4-95 | 513 4-973 
4-74 4:97 5-08 4-930 
4°84 5-00 5-09 4977 
tr | AD, eels ws 
} | 
| | 12 Y.. 
4845 | 4994 | 5-12 anne 




















te 









































Tuomas S. RusseL_t AND Ratpu ALLAN BRADLEY 127 
Table 2. Residuals (y;;—y;,—y.;+..) for three fermentors for 38 days 
Fermentors 
Days 
| | 
1 | 2 | 3 
1 —0-001 | —0-010 | 0-011 
2 — -028 | -053 | — 026 
3 — -008 | -023 |} — 016 
4 — -061 -060 001 
5 — -081 | -060 021 
6 —0-011 | 0-010 0-001 
7 -022 — -017 — -006 
s 009 000 — -009 
9 002 -013 — -016 
} 10 079 — -070 — -009 
11 —0-015 —0-014 0-027 
12 — -038 043 — -006 
13 009 -010 — -019 
14 002 013 — -016 
15 — -001 -020 — -019 
j 16 0-009 0-010 | 0-019 

17 — -008 -033 | — -026 
18 — -001 — -010 | O11 
19 025 — -004 | — -023 
20 039 — -030 | — -009 
21 0-012 —0-017 | 0-004 
22 — -008 013 | = 006 
23 — -025 — -004 | -027 
24 029 — -030 | 001 
25 029 — -030 | -001 | 
26 0-019 — 0-030 0-011 | 
27 — -001 — -010 | O11 | 
28 — -031 -000 -031 | 
29 — -018 — -027 044 | 
30 055 — -044 | — -013 
31 —0-011 0-040 | 0-029 | 
32 — -008 -013 — -006 | 
33 — -001 — -040 | 041 | 
34 012 -003 | — -016 
35 022 | = -047 | 024 

| 
36 0-009 | —0-030 0-021 | 
37 — -048 -033 | 014 
38 -005 | -016 | 023 
G, 0-032520 0-034786 0-014108 
Q; 0-001537 | 0001722 | 0-000041 

| | eat 

E = G,+G,+G, = 0-081414 




















128 One-way variances in a two-way classification 


7. DISCUSSION AND SUMMARY 


The problem of obtaining estimators for the individual, one-way, error variances in an 
unreplicated, two-way classification has been studied. The estimators obtained are equi- 
valent to those previously used by Grubbs and Ehrenberg although the methods of deriva- 
tion are different. The estimators are shown to be maximum-likelihood estimators in a 
certain sense for classifications with r = 3 columns, and also result more generally from 
consideration of reasonable restrictions imposed on a general quadratic form in the 
observations. 

Test procedures, not previously available, are developed and applied in a numerical 
example that illustrates the necessary computations. One test is for homogeneity of the 
one-way variances and is available based on large-sample theory when r = 3. Another test, 
which is exact for small samples and unrestricted values of r and n, is for homogeneity of 
the one-way variances again, but under the assumption that it is known that all but one 
specified variance are homogeneous. A discussion of the distribution of an individual 
estimate of variance is included under a third test procedure like the second discussed above 
except that then it is supposed that the common value of the homogeneous variances is 
known. We have applied the methods of this paper in several areas of research and found 
the results useful and informative. 

Test 3, the general test of homogeneity of variances, is available only when r = 3 and is 
only asymptotically exact as n becomes large. We have attempted to obtain a small-sample 
test based on the statistic, 

2 
[a wane al 
2 


E 





r 
T=> 
j=1 
or > Q3/H? which is monotonically related to 7’. In the special case in which n = r = 3, we 
j 


obtained an exceedingly complex distribution function for 7’ which is shown by Russell 
(1956). This result did not suggest the appropriate generalization for larger values of n 
and r, nor did it suggest that further consideration of the small-sample distribution of T is 


likely to be fruitful. Further study may lead to an approximate procedure for values of 
r>3. 


In conclusion we should like to express our appreciation to National Distillers Products 
Corporation and particularly to Dr H. A. Conner of their Research Division for permission 
to use their data for our example. In addition, the brief description of the process yielding 
the data was prepared by Dr Conner and given here almost verbatim. Suggestions by 
D. B. Duncan, W. A. Thompson, Jr, P. D. Minton and P. N. Somerville along with their 
interest encouraged us in the development of this paper. 


REFERENCES 


Barter, M. 8S. (1937). Some examples of statistical methods of research in agriculture and applied 
biology. J.R. Statist. Soc. (Suppl.), 4, 137. 

BrabD.ey, R. A. (1958). Recent research in statistical problems in subjective testing. Transactions 
of Twelfth Annual Convention of Amer. Soc. Quality Control. 


EnxRENBERG, A. 8. C. (1950). The unbiased estimation of heterogeneous error variances. Biometrika, 
37, 347. 





fy 
‘s 





GRU 


VAI 


in an 
equi- 
riva- 
sina 
from 
n the 


erical 
f the 
* test, 
ity of 
it one 
ridual 
rbove 
ces is 
found 


and is 
ample 


3, we 


vussell 
s of n 
of T is 
ues of 


ducts 
ission 
elding 
ns by 
. their 


upplied 
actions 


netrika, 





Tuomas S. RussELL AND RALPH ALLAN BRADLEY 129 


Grusss, F. E. (1948). On estimating precision of measuring instruments and product variability. 
J. Amer. Statist. Ass. 43, 243. 

GuRLAND, J. (1955). Distribution of definite and of indefinite quadratic forms. Ann. Math. Statist. 
26, 122. 

Kauman, W. E., Gorrsretn, J. W. & Lantican, D. (1956). Quality evaluation by numerical and 
subjective methods with application to dried veneer. Biometrics, 12, 127. 

MacDurrer, C. C. (1946). The Theory of Matrices. Corrected reprint of first edition. New York: 
Chelsea Publishing Co. 

PacuarEs, J. (1953). On the distribution of quadratic forms. Institute of Statistics (mimeo series, 
no. 75), Chapel Hill, North Carolina. 

Russetx, T. S. (1956). Estimation of individual variation in an unreplicated two-way classification. 
Ph.D. dissertation, Virginia Polytechnic Institute Library, Blacksburg, Virginia. 

Vartax, M. N. (1955). On an application of Kronecker Product of matrices to statistical designs. 
Ann. Math. Statist. 26, 420. 


Biom. 45 








[ 130 ] 


STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS 
VII. THE PRINCIPLE OF THE ARITHMETIC MEAN 


By R. L. PLACKET: 
University of Liverpoc’ 


The history of the problem of combining a set of independent observations on the same quantity is 
traced from antiquity to the appearance in the eighteenth century of the arithmetic mean as a 
statistical concept. 


The problem of estimating parameters from observational data appears first to have 
presented itself to the Babylonian astronomers of the last three centuries B.c. Their achieve- 
ments are recorded in cuneiform script on clay tablets and have been analysed by Neuge- 
bauer (1951) who has also (1955) published a collection of the texts. The following summary 
is abstracted from hisresearches. Between about 500 and 3008.c., the Babylonians developed 
a systematic mathematical theory to account for the motions of the sun, moon and planets; 
and they evolved simple arithmetical schemes by which the positions of these bodies could 
be calculated at regular intervals of time. Beyond the fact that the basic parameters in the 
schemes represent a compromise between observation and the needs of computation, 
nothing has survived to indicate how they were estimated from the original data, which are 
themselves almost wholly absent. 

Rather more information is available concerning the methods by which the Greek 
astronomers analysed their observational data, for their discoveries were made possible, 
partly by developments of mathematical technique, and partly by the steady accumulation, 
since about 3008B.c., of a series of observations on the positions of stars and planets, made 
with graduated instruments. The Syntaxis of Claud Ptolemy not only presents a complete 
account of what was known to them, but also contains nearly everything that survives of 
the work of their greatest representative, Hipparchus. In what follows, we refer to the 
edition in two volumes translated and annotated by Karl Manitius (1913). 

According to I, p. 133, Hipparchus noticed inequalities in the intervals of time between 
successive passages of the sun through the same solstitial point, and this suggested to him 
the question whether or not the length of the tropical year is constant. He considered, 
however, that the error in his observations and in the calculations based on them might 
amount to as much as } day, and he concluded that any variation in the length of the year 
was quite insignificant. Subsequently, Hipparchus estimated the maximum variation in 
length as ? day, apparently by taking half the range of his observations (I, pp. 136-7). 


In fact, Hipparchus calculates with the help of certain eclipses of the moon, observed in the immediate 
neighbourhood of fixed stars, how far the star called Spica was west of the autumnal point at each 
eclipse, and finds some indication in this way that it shows in his time a maximum distance of 6}° 
anda minimum of 5}°. Whence he draws the conclusion, since it is not well possible that Spicashould have 
undergone such a considerable change of position in so short a time, that probably the sun, from whose 


position Hipparchus determines the positions of the fixed stars, does not accomplish its return at equal 
intervals. 


The technique of taking the arithmetic mean of a group of comparable observations had 
not yet, however, made its appearance as a general principle. This is shown by Ptolemy’s 





estil 
had 


a th bh @ @ co © @ tal 


[CS 





R. L. PLAcKETT 131 


estimation of the amount by which the length of a year exceeds 365 days. — 
had made the observations given below (I, pp. 134-5): 


Autumn equinox Spring equinox 
(1) 1628.c. Sept. 27 18" (1) 1468.c. March 24 6» (11" at 
(2) 1598.c. Sept. 27 6» Alexandria) 
(3) 158 B.c. Sept. 27 12» (2) 1358B.c. March 23/24 midnight 
(4) 1473.c. Sept. 26/27 midnight (3) 128B.c. March 23 18» 
(5) 1468B.c. Sept. 27 6h 


(6) 1438B.c. Sept. 26 18h 


Ptolemy gives (I, p. 142) a single observation of his own on the Autumn equinox, namely, 
4.D. 139 Sept. 2647", and compares it with the fourth observation of Hipparchus, whence he 
finds that in 285 Egyptian years of 365 days, the Autumn equinox advances by 70°7*. 
which he writes as 70+4+15 days. He then gives (I, p. 143) a single observation of his 
own on the Spring equinox, namely, a.p. 140 March 224 13", and by comparing it with the 
first observation of Hipparchus, again arrives at an advance of 70+}+245 days in 285 
Egyptian years. A year of 365} days would imply an advance of 71} days in 285 years, and 
the decrement of 71} — 708, = 48 day = 285 years is equivalent to 1 day in 300 years. Thus 
Ptolemy reaches the value of 365} — 3}, days for the length of the year, and this is precisely 
the value which Hipparchus is quoted (I, p. 145) as having found. 

A similar example of Ptolemy’s veneration for Hipparchus is provided by his discussion 
of the precession of the equinoxes, a phenomenon discovered by Hipparchus, and caused 
by the motion of the pole of the equator round the pole of the ecliptic, the annual movement 
being about 50”. According to a quotation in IT, p. 15, Hipparchus estimated the change in 
the position of the solstices and equinoxes to be at least ;35° per annum. Ptolemy then gives 
(II, pp. 18-20) a catalogue of the declinations of 18 stars as observed by (i) Timocharis and 
Aristyllus, about 2908.c., (ii) Hipparchus, and (iii) himself. He selects 6 stars from the 
catalogue and shows that they all lead to a precessional constant of approximately 735° per 
annum, which is thus his estimate, whereas for Hipparchus it was a lower limit. These 
unique data have been analysed by several commentators, beginning with Delambre 
(1817, pp. 254-5) who showed that the average precessional constant from all 18 stars is 
near the correct value, whether the changes of declination from (i) to (ii), or from: (ii) to (iii), 
are taken. Recently Pannekoek (1955) has confirmed the accuracy of Ptolemy’s arithmetic; 
and he suggests that Ptolemy selected the 6 stars which agreed best with the value of 
aia’ per annum, but which actually each exhibit too small a change of declination. 

The technique of repeating and combining observations made on the same quantity 
appears to have been introduced into scientific method by Tycho Brahe towards the end 
of the sixteenth century. According to his biographer, Dreyer (1890, p. 350): 


Each observation thus gave a value for the right ascension of a Arietis. During the following six years 
Tycho repeated these observations as often as an opportunity offered, and, in order to eliminate the 
effect of parallax and refraction, he combined the results in groups of two, so that one was founded on an 
observation of Venus while east of the sun, the other on an observation of Venus west of the sun; while 
the observations were selected so that Venus and the sun as far as possible had the same altitude, declina- 
tion and distance from the earth in the two cases. From the observations of 1582 Tycho selects three 
single determinations, and from the years 1582-88 twelve results, each being the mean of two results 
found in the manner just described. The fifteen values of the right ascension of « Arietis agree wonder- 
fully well inter se, the probable error of the mean being only + 6”, but the twenty-four single results in 
the twelve groups show rather considerable discordances, the greatest and smallest differing by 16’ 30”. 


9-2 








132 History of probability and statistics. VII 


But anyhow the final mean adopted by Tycho is an exceedingly good one, agreeing well with the 
best modern determinations. He adopts for the end of the year 1585 26° 0’ 30”, the modern value for 
the same date being 26° 0’ 45”. 


The observations to which Dreyer refers are reproduced below from Tycho’s collected 
works (2, 170-97): 


1582 February 26 26° 0’ 44” 
1582 March 20 26 0 32 
1582 April 3 26 0 30 
ofa A ed ae 
ot ell 14 ~~ yf 3 26 0 38 
epee) 6 aS IRREY me Y 
os see 15 a mt :} 26 0 32 
198 Dreenbee $88 HBL) ag gus 
eee eel ae 
non Me A RR 
1588 October 2625 8 Ig f 289 29 
1588 Apri 16 = «35 5a asf 28 04 
1568 Aprt6 35 50 6} 0 
1588 March 2836. 6 20, f 28.028 
1588 Aprl16, = 26 @ go} 26 8 39 


The process of combining the first pair is thus described by Tycho (ibid. p. 171). 


Ab hac rursus Differentia Ascensionis vsque ad Lucidam ‘Y subtracta, quae est part. 83. min. 57. //.20, 
prouenit Ascensio Clarae ‘Y, part. 25./.56.//.10, cui pro Mensibus 3 residuis addantur //.13, & obtine- 
bimus Ascensionem Rectam Lucidae °° part. 25. min. 56.//.23, Anno 1585 completo correspondentem. 
Sed Anno 82 ex Die 27 Februarij, fuit eadem Ascensio Recta prius data part. 26. min. 4. //.16, vt sit 
differentia vtriusque min. 7.//.53: Dimidiata min. 3.//.564 addita minori vel subtracta a maiore, 
prodit vera & limitata Ascensio Recta Lucidae ‘°Y part. 26./.0.//.20. Quam hac Methodo nulla habita 
ratione Parallaxium atque Refractionum, sed illis sese mutuo sic corrigentibus, inquirere propositum 
erat. 


The average of the twelve determinations by means of two is 26° 0’ 27”, and the average 
of all fifteen is 26° 0’ 29”. How Tycho arrives at 26° 0’ 30” is not described, but we note that 
the co-ordinates of the nine standard stars in his catalogue are all given at 5” intervals, more 
than adequate for observational purposes. In fifteen cases out of eighteen, the co-ordinates 
differ from their exact values by less than 1’, and Kepler has described in a famous passage 
(Astronomia Nova..., Chap. 19; Werke, 3, 178) how he was able to calculate the elements of 
a circular orbit for Mars, differing from Tycho’s observations by 8’ or less, but rejected it 
because he knew that errors of 8’ could not be neglected with so diligent an observer. 








th the 
lue for 


lected 


/.20, 
tine- 
item. 
vt sit 
liore, 
abita 
itum 


rage 
that 
nore 
ates 


age 
ts of 
dd it 





R. L. PLackett 133 


We see that Tycho used the arithmetic mean to eliminate systematic errors. The calcula- 
tion of the mean as a more precise value than a single measurement is not far removed and 
had certainly appeared about the end of the seventeenth century, as is shown by the following 
extract from Flamsteed’s discussion of the errors produced by his mural arc on the right 
ascensions of stars (1725, vol. 3, p. 137): 


Rectarum Souis Adscensionum Differentia inter 14°" Martii ac 15°" Septembris [of 1690] ex Obser- 
vationibus circa Solem pro istis Diebus reperitur, viz. 

















per Calcem Castoris # 178° 36’ 0” 
per PROCYONEM — 178 36 5 
per PotLtucEM 178 36 20 
Media inter has Differentia 178 36 8 
At hance Mediam subtrahendo a Souis Recta 
Adscensione 15” Septembris, viz. 182 31 53 
178 36 8 
remanet eius vera Recta Adscensio 14° Martii Meridie ———————-—— 3 55 45 
quae verum dat eius Locum sy 417 7 


A third example illustrates the combination of data from different observers. During 
1736-7, a French expedition under Maupertuis was sent to Lapland in order to measure the 
length of a degree of latitude and, by comparing it with the corresponding length in France, 
to decide whether the earth was flattened at the poles, as maintained, e.g. by Newton, or 
at the equator, as held by the Cassini family. Their method of observation, as described by 
Outhier, has been summarized by Clarke (1880, p. 5) as follows: 


Each observer made his own observation of the angles and wrote them down apart, they then 
took the means of these observations for each angle: the actual readings are not given, but the 


mean 1s. 

In the event, the degree proved to be longer in Lapland, and Voltaire congratulated 
Maupertuis on having flattened both the poles and the Cassinis. 

At about this time, the calculus of discrete probability assumed an organized form, and 
the appearance of the differential calculus made extensions to continuous probability 
possible. The distribution of the arithmetic mean now began to receive the attention of 
mathematicians who were conversant with the new techniques, and a pioneer study by 
Simpson was followed by a long memoir from Lagrange. 

In his paper of 1755, Simpson gives the proba!” ty that the mean of ¢ observations is at 
most m/t for the following two error distributions: 


(i) possible errors are —v,..., —2,—1,0,1,...,v and equal probabilities are attached 
to them; 

(ii) the same set of errors with probabilities proportional to 1, 2,...,v+1,...,2, 1, 
respectively. 


The solution for (i), when expressed as a gaming problem, was known by 1710 and Simpson’s 
treatment by generating functions is the same as de Moivre’s (Todhunter, p. 85); since the 
generating function for (ii) is the square of what it is for (i), Simpson’s initial contribution 
amounted mainly to realizing the physical interpretation of a mathematical result. 

What is novel in Simpson’s work appears in the four pages of additional material published 
in 1757. Here he extends the solution of the second problem to the limiting case where the 
error distribution is continuous, in the form of an isosceles triangle, and, by integration, 
finds the probability that the mean is nearer to zero than a single independent observation. 











134 History of probability and statistics. VII 


Simpson’s debt to de Moivre is clear and the widespread respect which The Doctrine of 
Chances inspired during this period is notably attested by the following quotation from a 
letter written by Lagrange to Laplace on 30 December 1776. 


Il est vrai que j’ai eu autrefois l’idée de donner une traduction de l’Ouvrage de Moivre, 
accompagnée de notes et d’additions de ma facon, et j’avais méme déja traduit une partie de cet 
Ouvrage; mais j’ai depuis longtemps renoncé & ce projet, et je suis enchanté d’apprendre que vous 
en avez entrepris l’exécution, persuadé qu’elle répondra & la haute idée qu’on a de tout qui sort de 
votre plume. 


In the first fifty pages of his memoir, Lagrange presents a detailed discussion of discrete 
error distributions, on lines essentially the same as those followed by Simpson; he again makes 
free use of generating functions, and again extends results from discrete to continuous 
distributions by appropriate limiting processes. This section also includes (problem 6) 
a derivation of what we would now describe as the maximum likelihood estimates of the 
parameters in a multinomial distribution; and purports to show (problems 4 and 5) that the 
mode of the distribution of sample means is the same as the population mean. The chief 
contribution of the memoir to the probability theory of the arithmetic mean occurs in its 
last twelve pages, where Lagrange gives a method of obtaining the results for continuous 
distributions directly. He begins by evaluating 





f° x™1de  (m—1)! 
0 a ~~ (loga)™’ 


where a is larger than unity. He now says that the coefficient of a?-* in 

(Pa? + Qa?-! + RaP-? + ...)/(log a)”, (1) 
is obtained on replacing 

1/(loga)™ by I x™—lq-*dx|(m—1)! 
and is thus given by t 
{Pa™-1 + Q(x—1)"-1+ R(x—2)"-1 + ...}da]/(m—1)!. 

He next asserts that the probability element for the sum of n independent variables, each 
with density function y(x), is the coefficient of a* in ify az a" , where the term ‘coefficient’ 
is used in the sense just defined. Several examples follow, in all of which the error distribution 
has a finite range, so that [y-arae is a sum of terms like (1), and is therefore amenable to 


the processes he has described. The last error distribution is given by 
y= Keosx (—41<2<}n), 


and the memoir concludes with a set of ingenious manipulations involving imaginary 
quantities. 

At this interval of time, we can recognize the last part of Lagrange’s memoir as a starting 
point for the theory of integral transforms, although its merits were scarcely visible 
to Todhunter, writing in 1865. However, they were at onze appreciated by Laplace, 
who refers to ‘la belle méthode que vous donnez’ in a letter written to Lagrange on 





pre 


maweee 





‘ine of 
rom a 


loivre, 
de cet 
© vous 
ort de 


screte 
nakes 
1uoUus 
m 6) 
f the 
it the 
chief 
in its 
uous 


(1) 


ach 
ant’ 
bion 


2 to 








R. L. PLAcKETT 135 


11 August 1780, and who subsequently made the technique a basic part of his attack on the 
problem of combining observations. 


I am very grateful to Dr A. Fletcher for his invaluable suggestions and guidance on 
astronomical matters, and for greatly improving my translations. 


REFERENCES 


CLARKE, A. R. (1880). Geodes Oxford: Clarendon Press. 

DELAMBRE, J. B. J. (1817). ke .toire de lV’astronomie ancienne, 2 vols. Paris: Courcier. 

Dreyer, J. L. E. (1890). Tycho Brahe; a Picture of Scientific Life and Work in the XVIth Century. 
Edinburgh: Black. 

FLAMSTEED, J. (1725). Histo “1 Coelestis Britannica, 3 vols. London: Meere. 

Kep.eR, J. (1609, collected works 1937). Astronomia Nova..., Gesammelte Werke, 3, ed. M. Caspar. 
Miinich: Beck. 

LaGRANGE, J. L. (about 1775). Mémoire sur l’utilité de la méthode de prendre le milieu entre les 
résultats de plusieurs observations, dans lequel on examine les avantages de cette méthode par 
le calcul des probabilités, et ot l’on résout différents problémes relatifs & cette matiére. Miscel- 
lanea Taurinensia, 5; (1868) Guvres, 2, 173-234. 

LaGRANGE, J. L. (1776). Letter to Laplace, 30 December; (1892). Gwvres, 14, 66. 

Lapiace, P. S. (1780). Letter to Lagrange, 11 August; (1892). Q@uwvres de Lagrange, 14, 95. 

Manitius, K. (1913). Des Claudius Ptolemdéus Handbuch der Astronomie, 2 vols. Leipzig: Teubner. 

NEUGEBAUER, O. (1951). The Exact Sciences in Antiquity. Copenhagen: Ejnar Munksgaard. 

NEUGEBAUER, O. (1955). Astronomical Cuneiform Texts, 3 vols. London: Lund Humphries. 

PANNEKOEK, A. (1955). Ptolemy’s precession. Vistas in Astronomy, vol. 1, pp. 60-66, ed. A. Beer. 
London and New York: Pergamon. 

Smeson, T. (1755). A letter to the Right Honourable George Earl of Macclesfield, President of the 
Royal Society, on the Advantage of taking the mean of a number of observations in practical 
astronomy. Phil. Trans. 49, part 1, 82-93. 

Srmpson, T. (1757). An attempt to show the advantage arising by taking the mean of a number of 
observations in practical astronomy. Miscellaneous Tracts on some curious and very interesting 
Subjects in Mechanics, Physical Astronomy, and Speculative Mathematics, ..., pp. 64-75. 

TopHUNTER, I. (1865). A History of the Mathematical Theory of Probability from the Time of Pascal 
to that of Laplace. London: Macmillan. 

TycHonis BRAavE Dant (1602, collected works 1915). Opera Omnia, Tomus IT, ed. I. L. E. Dreyer. 
Hauniae, in Libraria Gyldendaliana. 








[ 136 ] 


MULTIVARIATE LINEAR STRUCTURAL RELATIONS 


By R. L. BROWN anv F. FEREDAY 
British Coal Utilization Research Association, Leatherhead, Surrey 


Given n observations of m-variates having known errors, the envelope of primes, associated with a 
given probability level, is shown to be a quadric primal the nature of which determines the accept- 
ability or otherwise of a prime as a structural relation. If the variates derive from r independent 
linear equations, i.e. an (m—r)-fold, the rational definition proposed is that the (m—r)-fold is an 
acceptable structural relation if every prime through it is acceptable. 

The consequences of the rational definition are shown to be consistent and are contrasted with 
some less satisfactory properties of Tintner’s method. The coefficients in the relation are not estimated 
in the usual way, although a ‘best’ relation can be given. Certain practical advantages are noted. 


The connexion with canonical correlations, confluence analysis and other qualitative methods is 
discussed briefly. 


1. INTRODUCTION 


For the study of the linear structural relation between two variates, Brown (1957) has 
proposed an envelope method, which yields a region bounding all relations acceptable at 
some assigned probability level. The shape of the acceptance region determines whether or 
not the evidence is in accord with the hypothesis of a linear structural relation. Thus, the 
coefficients in the relation are not examined separately, the relation being treated in toto. 
In this paper the envelope method is generalized to the case of one or more linear structural 
relations amongst m-variates. 

Let X,, X2, ...,X,, be m observations of a vector variate X, which is multi-variate normally 
distributed with dispersion matrix I, the unit matrix. Each of these normalized obser- 
vations may be represented as a point P; in a flat m-dimensional space, deriving from an un- 


known true point Q; with co-ordinates X,;. The simplest linear relation that can obtain for 
the true points is the primal relation, 


%+a'X = 0, (1-1) 


where «, is a scalar and «’ the row vector transposed from the column vector a. We consider 
the acceptability or otherwise of such a primal relation, having regard to the known errors 
of measurement in each of the variates, by finding an acceptance region, in the space of 
true points, enveloping all primes having an assigned probability. As in the bivariate case, 
there exists a ¢-number, such that n¢ is a random variable distributed as y? with n degrees 
of freedom, which can be expressed in terms of the observations and the coefficients a, a’ 
of the prime (1-1). Thus if N; is the foot of the perpendicular from P; on to the prime 


(PN, = Ag tax; — Oy + a (X; + 8,) _ w; 














fe = 1-2 

(c’x)t (c’ x)? (a’a)e’ (12) 
where 8, is the vector error corresponding to x,. Let z be the random variable a’8,/(«’a)!; 
then &(z) = 0, 

r aS | a (vard)a _ ; 

var (z) = var (ea “a 1, (1-3) 

Hence, nd = . (2°6;)'(a'S,) = . 5;(aa’) 8, 

j=1 (a) j=1 (aa) 





has a 


ee eee eS le CU 





+3) 





R. L. Brown anp F. FEREDAY 137 
has a x? distribution. But 


nd = > (+ a'X;)’ (oy + @’x;) ai z (P;N;)? (1-4) 


j=1 (a'a) j=1 





is determined by the observations and the coefficients of the prime only. The remaining 
n (m—1) degrees of freedom are associated with the displacements of the true points Q; 
from N;. We can now make the probability statement 


Pr (P< ¢,) =P, (1-5) 


where p is some assigned level of probability. Then the nwmerical statement (1-5) leads by 
the envelope method (§ 2) to a quadric primal envelope of acceptable primes and, by the 
confidence method (§ 4) to a confidence region for the coefficients a, a of the prime jointly. 

Although the algebra thus far is similar to that given elsewhere, for example in the study 
of canonical correlations and principal components (Hotelling, 1933, 1936; Bartlett, 1934, 
1937, 1941), the objective here is different in that we do not assume a priori that there is 
a structural relation. Nor are we concerned primarily with estimation although, having 
shown the existence of a relation, we can afterwards indicate a ‘best’ relation. It is then 
not necessary to give limits for the individual coefficients in the relation and indeed these 
are interrelated (cf. Brown,1957). 

Tintner (1945), using results due to Fisher (1938) and Hsu (1939, 1940), has shown that 
the rank of the variance-covariance matrix of the observations can be established from 
the roots ¢,,..., ¢, of the determinantal equation (cf. equation (2-3) below) by applying a 


Xin, test to the statistic ere ee (1:6) 


Here rn, is r(n—m—r+1), the degrees of freedom associated with an (m—r)-fold in m 
dimensions. This follows by noting that r stars of primes through r non-intersecting (r — 2)- 
folds determine once, and once only, every (m —r)-fold in the space and that a prime through 
an (r—2)-fold has (m—r+1) degrees of freedom. This test establishes whether the true 
points derive from an (m—r)-fold, which case we propose to call partial relations of order r, 
there being r independent linear equations. 

Tintner (see §6) appears to have defined the (m—r)-fold in terms of the perpendiculars 
from the observations.* As an alternative to this perpendiculars definition, we propose the 
rational definition that every prime through an (m—r)-fold must be acceptable if the struc- 
tural relation is partial of order r (§ 3). This definition is obviously in accord with the mathe- 
matical requirement that any r-independent primes define an (:—1r)-fold. Taking the view 
that the confocal system of quadric primals derived from equations (1-4, 5) is to be regarded 
as a transformation into true point space of the errors, the degrees of freedom of n¢ are n, 
whatever the order of the partial relation. Whereas for Tintner the latent roots ¢; have a 
probability distribution, the envelope method yields fixed ¢,, the probability distribution 
being associated with ¢,, in equation (1-5). If it were felt that the coefficients of the quadric 
primal had been ‘estimated’ from the observations, it would be appropriate to diminish 
the degrees of freedom accordingly, provided it could be shown that n could be split into 
independent parts, each separately having a y?-distribution. Our present view is that such 
a procedure would be incorrect. 


* It is shown in $6 that the envelope method also may be applied to partial relations under the 
perpendiculars definition. 








138 — Multivariate linear structural relations 


The rational definition of partial relations is easily seen to lead to the result that the order 
of partial relations cannot be derived from consideration of the relations holding amongst 
subsets of the variates. This fact shows (§7) the inadequacy of confluence analysis (Frisch, 
1934; Mudgett & Frisch, 1931; Stone, 1945). 

An advantage of the envelope method is that in practical examples we may choose 
unambiguously to calculate partial relations of order r, either as the intersection of r primes 
(using for this purpose the principal primes corresponding to the r smallest roots of the 
determinantal equation), or in parametric form (using the principal axes associated 
with the largest roots). If two or more of the roots are nearly equal their use can thereby 
be avoided and the necessity of carrying a large number of figures in the numerical work 
is obviated. 

The work is limited to linear relations. It will not indicate tlie presence of two or more 
independent partial relations. Thus, where the true points lie on a pair of distinct lines in 
four dimensions, the test would lead either to a twofold or a threefold, according as the 
lines met or were skew. Such independent partial relations are a degenerate case of curvi- 
linear partial relations (the pair of lines being a degenerate conic or twisted cubic). There is, 
however, one simple but important case that can be settled by the present analysis. There 
may be present an element x; of the vector x that is irrelevant to the other variables; then 


n 
x; = constant is an acceptable primal relation. Directly, however, > (x;;—%;)?/(m — 1) is an 
i=1 


estimate of a unit variance having (n — 1) degrees of freedom and therefore may be tested 
against y?_,/(n—1). It will be assumed that such ‘variable constants’ are omitted from 
the analysis. 


2. ACCEPTANCE QUADRIC 


The variance-covariance matrix U of the observations is composed of elements w,,, where 
n 
Nby = LY Vy Xy (2-1) 
ba | 


the origin being at the mean of the observations. Then from (1-4) 





¢ ae (2:2) 


and the stationary values ¢,, i = 1, 2, ...,m, of ¢ are latent roots of U, i.e. of 
| U-¢1| = 0. (2:3) 
The primes corresponding to these stationary values are derived from the latent vectors 
1 of U; writing L for the m x m matrix of the m latent vectors 
UL=LA; L“=L' (24) 
for L is an orthogonal matrix; A is the diagonal matrix with elements ¢,. 
For any chosen significance level of ¢, the prime (1-1) is constrained by the relationship 


(2-2). The envelope of all such primes is found by eliminating the coefficients from the 
equations 


O(Ah + jj Uys — pO, P)  AO(%qy+a,X,) 
00, ‘‘ Oa, 





, (s=0,...,m;%,j, k summed double suffix! to m), 
(2:5) 








di 





order 


ongst 
risch, 


hoose 
rimes 
of the 
iated 
ereby 
work 


more 
1es in 
s the 
urvi- 
re is, 
Chere 
then 


is an 


ested 
from 


here 


(2-1) 


(22) 


(23) 


ctors 


(2-4) 


ship 
. the 


>m), 
(2-5) 





R. L. Brown anp F. FEREDAY 139 


where A is a constant factor. If U# is the cofactor of the (ij)th element in | U-¢I]|, the 
envelope is easily seen to be the quadric primal 


| U-$1| + U4,X,;X; = 0, (2-6) 
which we may rewrite in matrix form as 
14+X’'(U-¢1I1X =0. (2-7) 


As shown above, we have an orthogonal matrix L formed by the latent vectors of U; 
we transform to new co-ordinates Y, where 


Y = L’X. (2-8) 
Now (L’X)' (U-@L) (L’X) = X’L(U-¢]1)*1 L’X 
= X’'{L(U—¢I) L’}"X. 
But LUL'’=A, LIL’ =I. 
Hence the envelope in Y-co-ordinates is 
1+Y(A-—¢I)1Y = 0. (2-9) 


But (A—¢J) is a diagonal matrix with elements (¢;—¢); hence (A—¢J)“ is a diagonal 
matrix and we have for the envelope 


m Y? 
6-9 


Obviously the principal primes of the quadric primal (2-10) are the co-ordinate primes 
¥, and also the primes (cf. (2-3), (2-4)) corresponding to the stationary values ¢, of ¢. The 
Y-co-ordinates are a canonical set such that co-variances of the observations are zero. 
For different ¢ the quadrics are confocal. 

It is well known that through any point 7' there are m confocals (for which ¢ takes the 
values ¢;) and that the tangent primes 7; to these confocals are mutually orthogonal; thus 


+1=0. (2-10) 





m T} m T? m T? 


AG-0"* AGG” AG” 


Also we have the inequalities 


(all i, j, #49). (211) 


$y <t, <o<ty... bm <tmy<, (2-12) 


associated with the latent roots ¢,, arranged in ascending order of magnitude, of the 
dispersion matrix U of the observations. 
Any prime through 7' may be written, 


™ 
x Ayr; = 0 (2-13) 
i=1 
and the corresponding statistic ¢ is easily seen, using equations (2-11), to be 


> VitiY; 
d a. Teena (2-14) 
x Aju; 


i=1 








140 - Multivariate linear structural relations 


so that ¢ is stationary when the prime through 7’ is one of the tangent primes 7;. It is an 
immediate corollary that the primes through the intersection of the primes 7,, 7,, ...,7,, 
have stationary values of ¢ when they coincide respectively with 7,,7,,...,7,, and the 
stationary values are t,, fy, ..., ty. 


3. PRIMAL AND PARTIAL RELATIONS 


For any chosen significance level of 4, say ¢,,, associated with probability level p, we prove 
that the necessary and sufficient conditions for a partial relation of order r to obtain between 


the variates are 
$1 < Pe.» Pr py < Pp4a +++ Pm: (3-1) 


Take any point 7’ on the acceptance quadric (2-10) for which ¢ is ¢,. Then ¢,, is one of the 
parameters t; that serve to define the m confocals through 7'. Let ¢, be t,. Then by equation 
(2-12) there are (r — 1) smaller values of t;. Let M, be an (m — r)-fold tangent to the acceptance 
quadric at 7’ and formed by the intersection of the r tangent primes 7, ...7,. Then every 
prime through M, determines a statistic ¢ that has stationary values when the prime coin- 
cides with 7,, 79, ...,7,, respectively, and therefore has a maximum at t, = ¢,. Thus M, is an 
acceptable (m—r)-fold. If ¢, = ¢,, the (m—r)-fold Y,...Y, is just acceptable. It follows 
from equation (2-12) that (3-1) are necessary conditions if there are to be acceptable (m —r)- 
folds tangent to the acceptance quadric ¢,. Obviously (cf. equations (4-4, 5) below) every 
(m—r)-fold parallel to M, and nearer to the origin is such that the maximum value of ¢, for 
all primes through it, is less than ¢,: if the (m—r)-fold is further than M, from the origin 
there will be some prime through it for which ¢ exceeds ¢,,. Thus the acceptance quadric 
¢, bounds the acceptable (m—1r)-folds of type M,. 

Through the point 7’ on the acceptance quadric ¢,, there are acceptable (m—r)-folds 
tangent to the quadric and not coinciding with M,. It is not, however, essential to classify 
them, since, for the partial relations to be of order r at least, it is enough to show that there 
is at least one acceptable (m—r)-fold. It is geometrically intuitive that we may rotate M, 
about the point 7' and in the tangent prime 7, as far as the generating surface through T 
without causing the (m—~r)-fold to cease to be acceptable, the generating surface being the 
generalization of the pair of generating lines through any point of a quadric in three dimen- 
sions. In other words, we may rotate M, until it passes outside the quadric (see § 5). Once 
we pass through the generating surface, however, the (m—r)-folds become unacceptable. 
Analogously, in two dimensions, there is a line of ‘worst’ fit as well as one of ‘best’ fit. 
Finally, the acceptance quadric ¢,, bounds all acceptable (m — r)-folds. 

It remains to show that the conditions (3-1) are sufficient. Let M,,, be the (m—r— 1)-fold 
through the origin formed by the intersection of the r co-ordinate primes Y,, Yj, ..., ¥,4:- 
These are stationary primes and hence there is one prime through M,,, for which ¢ is ¢,,;, 
which exceeds ¢,,. Hence M,,, is not acceptable. For any other (m—r— 1)-fold containing 
the origin, the maximum value of ¢ for primes through it will exceed ¢,,,. For any (m—r-—1) 
fold not through the origin, the maximum value of ¢ for primes through it exceeds that for 
a parallel (m—r—1)-fold containing the origin. Hence no (m—r-— 1)-fold is acceptable. 

For ¢,<¢,, the acceptance quadric is imaginary and there is no acceptable linear 
relation. For ¢, = ¢, the co-ordinate prime Y, is just acceptable; the acceptance quadric 
becomes a focal quadric in the prime Y, = 0. For ¢,<¢,<¢, a prime is acceptable, and 
these are bounded by the acceptance quadric ¢,,, which is a generalized hyperboloid. For 





= Gr (tule 


is an 
ey Ty 


1 the 


rove 
ween 


(3:1) 


of the 
ation 
tance 
avery 
coin- 
isan 
llows 
\—T1)- 
every 
b, for 
rigin 
adric 


-folds 
ussify 
there 
te M, 
igh T 
ig the 
imen- 
Once 
table. 
t’ fit. 


)-fold 
ae 
Pra 
ining 
r—1) 
at for 
le. 

linear 
ladric 
», and 
|. For 





R. L. Brown Aanp F. FerEDAY 141 


dm-1<Pp < Pm: & line is acceptable, the equality obtaining for the co-ordinate axis formed 
by the intersection of the co-ordinate primes Yj, Ys, ..., Y,_,. For ¢, = ¢,,no relation obtains, 
as is also the case for ¢,, > ¢,,, when the acceptance quadric becomes a generalized ellipsoid: 
it could be said that the relation becomes indeterminate in direction and that only points 
lying inside the ellipsoid are acceptable. 

We have shown that a rational mathematical definition of a partial relation of order r— 
namely, that every prime through an (m—r)-fold must be acceptable, leads to a simple 
and self-consistent statistical test as given in equation (3-1), and also to a useful geometrical 
picture for the location of acceptable (m —r)-folds in relation to the acceptance quadric ¢,,. 


4, CONFIDENCE QUADRIC 
As has been shown previously (Brown, 1957), we may also derive a joint confidence region 
for the coefficients defining a prime. In Y-co-ordinates, let the prime be 
fot BY = 0. (4-1) 
From equation (2-4), the variance-covariance matrix of the observations is, after trans- 
forming to Y-co-ordinates, the diagonal matrix A and we have 
2 . 2 
Po+ X $x PE 
Balan. kala (4-2) 





Then for an assigned ¢,,, we may write the confidence quadric as 


i+ E(u by) B= 0 (4:3) 


in homogeneous co-ordinates (fp, 4, ...,8,,). This quadric has the same principal primes 
as the acceptance quadric. If all ¢, exceed ¢,, this quadric is imaginary and there are no 
suitable primes. 

If (Bo, £1, -+-» Bm) is a point in confidence space corresponding to any general prime and 
(26, i> «+> Bm) is @ parallel prime for which the associated point in homogeneous coefficient 
space lies on (4-3), then 


o ar Bo+ (bp— 9) & Bis (4-4) 
where ¢ is associated with the general prime. Thus 


|AolS|Fo| as pz, (4:5) 


and we can say that all points lying ‘inside’ (4-3) yield acceptable primes, and vice versa. 

But further interpretation depends on how we change to the ratios of the (m+ 1) para- 
meters f,, 8 = 0,...,m. For $< $,<$2<--.<m and f, = 1, (4:3) is a generalized ellipsoid 
and the confidence region is bounded. Putting unity for any other # yields a generalized 
hyperboloid; moreover, in terms of the coefficients « in X-space the quadric is in general 
non-central. Although both the acceptance quadric and the confidence quadric (however 
treated) yield the same answer, the complications ensuing from (4-3) do not appear to be 
worth pursuing. 








142 Multivariate linear structural relations 


5. THREE DIMENSIONS 


It appears worthwhile to give in more detail the results obtaining in three dimensions and, 
in particular, to prove the fundamental property of the acceptance quadric that was 
suggested in §3, namely, that every line lying wholly inside the quadric ¢, is acceptable 
at probability level », and conversely. 
It is clear that the two generating lines through any point on the acceptance quadric ¢, 
have a special property. One system of generators is given by 
fn Daryn. gr 2 5 
(op — $1) V(¢3—p) V(¢p — $2) 
(5-1) 
_ Y, — + OEM. SB = 1 (1 ~ag-+33) 
(Gp — 1) V(¢3— dp) A V(bp — G2) 
from which it follows easily that every plane through the generator has ¢ = ¢,. To see this, 
write down the value of ¢ for any plane through the generator in the form oA + ~B, where 
A and B are the planes of (5-1). It will be found that both o and y disappear. Following the 
argument of §3, take a tangent line between the generators such that the maximum ¢ for 
planes through it is ¢,,. As this line is rotated in the tangent plane, it passes at the generating 
line through a state where all planes through it have ¢ equal to ¢,,. Then on further rotation 
¢, becomes a minimum value of ¢, not a maximum, so that the line ceases to be acceptable. 
Clearly, in passing through the generating line, we pass from the situation for which every 
point on the tangent line is inside the quadric to one where the tangent lies outside the 
quadric. Moreover, the geometrical picture suggests the conjecture that this proposition, 
worded appropriately for spaces of odd and even dimensions, is generally valid. 
The condition that every point on the line 


Y=b,+hu, Y,=b,+l,u, Y; =6b,+l,u (5-2) 
shall be inside the quadric is 
(b,+1,u)? (bg+1,u (bs +1,u)? 
a + Gk =— <1, for all w. 5:3 
b-d * by—2  y—ds _ 
Hence we must have 
i L Is 
+ . 
op—i pp—e oyo—s 
(one of these implies the other) and if 


bh, 





b} ; b3 " b3 
oyp—y oy— de oy— 





H= <0, K= 





<l (5-4) 


bela Dal 





J= + + , 5:5) 
by: bp—Ga" Bye 
then |-H -J 
>0, (5-6) 
eae 
which may be rewritten as, 
* ms ms >H (5:7) 





(Pp <T $2) (Py “ 3) ™ (dy x ¢s) (Py 427 ¢,) " (Pp = $1) (Py 71) $2) 


where m, = (1,6, —1b,), ete. 





and 


The 


whe 


All 


ant 


It 


an 





and, 
; Was 
table 


ic ¢, 


(6-1) 


this, 
yhere 
g the 
@ for 
ating 
ation 
able. 


very 
> the 
tion, 


(5-2) 


(5:3) 


(5-4) 


(5:5) 


(5:6) 


5-7) 








R. L. Brown Anp F. FEREDAY 143 


We have now to write down the conditions that every plane through the line shall have 


¢<¢,- If the plane oY, +0q¥,+0,¥)+1 = 0 (5-8) 


is to contain the line (5-2), then 


€,6,+¢.b,+¢,b,+ 1 = 0, 
191 T C292 + C303 } (5-9) 
C11, + Cgle + Cgly = 0 
and we may obtain a pencil of planes A by taking* 
C,+Cgt+¢st+A = 0. (5-10) 
Am, +N, 
Then C, = te 73) 
where | b, 6, 6, 
A = | L l, ls > M,+ Ms + Ms; n, = l, yi l,, etc. (5-11) 
Tt ges jen 
All planes are acceptable if 
io 2 (gp- ?,) ms “oF x (d, ma ,) m,n, | 
V=| '>0. (5-12) 
| | —X(bp— 9) m,n, —Zbp— Fp) mr + A*-| 
and X(¢,, —¢,) mz > 0. (5:13) 


It is also implied that &(¢, — ¢,)n?>A?. Remembering that J] (¢,—¢,) is negative, (5-7) 
and the first of (5-4) show that (5-13) is valid. Now (5-12) it 
(bp — $1) (bp — Ga) (minz— mZnj) + two similar terms, 
> A*{(¢,, — ¢,) mj +similar terms}, 


and it is a well-known property of determinants that (m?n}—mjnj) = 3A?, etc. Thus 
(5°7), (5-12) are the same and the proposition is proved. 
Lastly, we note that the subsets of two variates, say 21, %, have a determinantal equation 


with roots 4}, #3 such that $< $i<b2<G5< dy (5-14) 


It is easy to write down ¢}, $3, and substitute in the cubic giving ¢,, $2, 63 and the proof 
follows. A general proof follows from a result given by Turnbull & Aitken (1932). If we have 
a line in three dimensions, 4; < $< ¢, and therefore we also have a line in each of the two 
dimensions formed by omitting each variate in turn, for n¢, has a x‘, distribution whatever 
m may be. The converse is not true. These two findings are to be expected for a truly 
structural relation. 


* In four dimensions we should have also a fourth equation in which the signs of c,; are alternatively 
plus and minus. 

t But from the perpendiculars definition (cf. §6) a line in three dimension requires no, <Xi, 
n(o, +.) < Xin and a line in two dimensions requires n$,<y2. Since Y3n<2Xn for p>0-75, 
(py +) <2y2 and it does not follow that n¢, < 7; sv that it cannot be deduced that nj < x2, and there 
is not necessarily a line in two dimensions. For this reason it would appear that the perpendiculars 
definition is less satisfactory than the rational definition. 








144 ' Multivariate linear structural relations 


6. ENVELOPE METHOD APPLIED TO PERPENDICULARS DEFINITION 


As already noted, another definition of an acceptable (m—r)-fold can be based on the 
requirement that the sum of the squares of the perpendiculars on to the (m—r)-fold should 
give an estimate of the error variance in accord with the appropriate independent compound 
variance derived from the known errors of measurement. This perpendiculars definition 
leads to Tintner’s test for a statistic y having a x?-distribution (cf. equation 1-6). We shall 
show that neither definition leads to result derivable from the other, but that, in practice 
the rational definition will usually lead to a smaller area for acceptable lines (in three dimen- 
sions) than does the perpendiculars definition. For simplicity the treatment is confined to 
three dimensions. 

Our condition ¢,<¢,<¢, implies (¢,+¢,)<2¢,. Now for the interesting range of 
probability p exceeding 0-75, x3, < 2x%, so that y, <2¢,.* Thus, we cannot deduce that 
($1 + $2) < ¥,. Conversely, the perpendiculars definition tells us that for the plane, 4, <¢,, 
and for the line (¢,+¢.)<y,. Both relations obtain if a line is to be accepted; but it does 
not follow that ¢, < ¢,. It is, however, possible that for some lines there is a relation between 
the associated ¢ and y that would enable a probability level p to be chosen so that one test 
implied the other. We show this is not the case. Consider the line in Y-space 


Y; = a,;pt+l;u; 2+2+2=1, at+ai%+ai=1, a,1,+a,l,+a,/, = 0 (6-1) 
and take (L, L,I) to be a unit vector orthogonal to (a,a,a3) and to (1,/,/,). Then any plane 


containing the line is 
(4, +AL,) Yy + (4g +ALg) Y, + (a3 +ALs) ¥3 = p (6-2) 


and the statistic ¢ for the pencil of planes through the line has a maximum value ¢,, 


2m = + {y? — p (Lid, + Lid, + Lids) — (ligabs+4bsb1+ 3b, 92)}4, (6:3) 
where = (1-1) 6, + (1—F) bo + (1-3) Og +p”. (6:4) 


The perpendiculars definition yields the statistic y. Hence 2¢,,>y. But x3, < 2x2, and the 
result follows. 

To gain a closer insight into the relationship of the two tests, we may find the envelope 
of lines (6-1) having a constant y. Since y is independent of the direction (a,a,a,) of the 
perpendicular from the origin on to the line, equivalently we find the envelope of the cylinder 
of lines distant p from the origin and in the direction (1,/,/,). This is easily seen to be quartic 


surface et r ee, ee meat os nM 
R+o.+bs-P B+dstgi-b R+d,+b.-p ” 
where R* = (Y2?+ Y3+ Y2). (6:5) 


This surface is real only if (¢,+¢.)<y and (¢.+¢,3)>y, the sign of (¢,+¢,—y) being 
immaterial. 

In practice, it is usually the case that ¢,>¢, and therefore near the origin (mean of 
observations) we can find an approximation to the quartic. A simpler result, sufficiently 
similar to the foregoing approximation for our present purpose, follows by considering the 


* Tintner gives y, @ ¥3n-4/(n— 1) distribution for an ‘estimated’ line; here we have not estimated the 
line and so take y3,,/n (cf. discussion of degrees of freedom in Introduction). 





spec 


The 


use. 


are 





R. L. Brown anp F. FeREDAY 145 


special case ¢, = ¢2, when the quartic becomes the quadric of revolution 
Yi+Y3_ y2 
y-—26, %,+¢,-Y 
The corresponding quadric obtained from the rational definition is 
YitY3  Y¥3 
o-%, 3—¢ 
Both quadrics are hyperboloids of one sheet. For 6,>~%>¢, we can put y—2¢,=Aa?’, 
¢—¢, = a*, d,+,— yp = b?, 6,—¢ =b?, and compare the hyperbolas (with Y? = Y?+ Y3) 
Te: 1 Fs. ¥} 


woe) Sen) — 





uA, (6-6) 





= 1. (6-7) 


where A is less than 2, but usually near it. We see from Fig. 1 that the rational definition 
limits the region of acceptable lines more than does the perpendiculars definition. 


Y =(¥7 +¥3)4 


f yyy 


Vy, IL Lf fo 









Yj 





/ 





WS Rational definition 
WL + OY Perpendiculars definition 


Fig. 1. Approximate comparison of envelopes of acceptable lines in three dimensions. 


Both definitions lead to statistically valid results and each has its appropriate field of 
use. But the perpendiculars definition does not ensure that every plane through acceptable 
lines is itself acceptable. Since the latter appears to be a necessary requirement when we 
are considering the presence or absence of structural relations in toto, it would seem that the 


10 Biom. 45 








146 Multivariate linear structural relations 
rational definition should be favoured. Fortunately the ‘best’ relations arising from the 
two definitions are the same. 
7. EXAMPLES 
7-1. Introduction 


In order to illustrate the foregoing and to compare the different methods of multivariate 


analysis, three examples have been constructed, each of twenty-five observations in four 
dimensions. 


The equations used were: 


Example I—a plane Example II—a prime Example IIT—a line 
Z, = 3u+ 2v Z, = 2u+2v+ w Z, = 5u 
Z,= ut+3v Z,= U+2v+2w Z, = 6u 
Z, = 5u-— v Z, = 3u— v+2w Z; = Tu 
Z,=7Tu- v Z, = 5u—2v Z, = 4u 


u, Vv, w are running co-ordinates. 

From a table of random numbers, twenty-five values were given to each of u, v, w, dif- 
ferent sets being used for each example. In this way, sets of twenty-five ‘true’ points were 
obtained. To these ‘true’ points were added random normal deviates (obtained from the 
Biometrika tables) after testing these deviates for any chance significant deviation from 


normality. The final ‘observed’ points are given in the Appendix, together with the standard 
deviations of the added error terms. 


7:2. Comparison of rational and perpendiculars definitions 


From the data thus obtained, the stationary values of ¢ were calculated and these are 
shown in Table 1. 


Table 1. Stationary values of ¢ 

















| Example Roots ¢; 
| 
i i 
I 0-818 1-362 5-150 | 54-99 
II 0-863 2-213 18-60 198 
| I 0-469 1-010 1-417 140-6 
oe | | 





Examination of these roots according to the ‘rational definition’ involves testing the 


roots individually using 25 degrees of freedom. In this case the critical values of ¢ are 1-50 
for the 0-05 level and 1-77 for the 0-01 levels. 


Table 2. Significance of roots according to the ‘rational definition’ 














Example Significance of roots 
I 0-818 1-362 5-15*** 54-99*** 
II 0-863 2-213*** 18-60*** 19-8*** 
III 0-469 1-010 1-417 140-6*** 











1 the 


riate 
four 


, dif- 
were 
1 the 
from 
dard 


p are 


r the 
1-50 





R. L. Brown anv F. FeREDAY 147 


In Table 2 and the following tables, one asterisk denotes the significance level 0-05, two 
asterisks the 0-02 level and three asterisks the 0-01 level. 

The rational test reveals at once the nature of the relationships. 

If, however, we examine the roots according to the ‘perpendiculars’ method (Tintners 
theory) we find the following table: 


Table 3. Significance test for ‘perpendiculars’ method 























A, for r equal to 
Example 
i | 2 3 
I 19-6 | 52-3 176 
II 20-7 73-8*** 520*** 

Til 11-3 | 35-5 70 
| 

p.F. after Tintner 21 | 44 69 
| } 











This leads to exactly the same conclusion as Table 2. It has been explained in an earlier 
section that in many cases the conclusions will be the same, but that cases may arise in 
which there are differences, as the ‘rational’ definition is more exacting than the ‘per- 
pendiculars’ definition. 


7:3. Determination of the ‘best’ relation from the rational definition 


All calculations relating to the determination of the ‘best’ relationship and the testing of 
theoretical relationships should be done in the ‘normalized’ co-ordinates, used throughout 
in the theory. Thus, if Z; represents the standard current co-ordinates, then x;, the normal- 
ized co-ordinates, are given by Z; = ox;+Z;, where o is the standard deviation of the error 
in Z;. To determine the ‘best’ relation, the direction cosines of the principal axes of the 
acceptance quadric are computed. These are the principal minors of the matrix | U-¢1| 
with ¢ given in turn the values ¢,, ¢5, 43, #4, as in Table 4. 


Table 4. Direction ratios of principal axes 

















1 2 3 4 
| $1 hy lye lis li 
bo Us, Uy yg Io 
$s lsy Iso Isg lag 
ba Iyy Is las lag 











If the relationship has been shown to be primal, only the first row is needed and leads 


at once to 114% + 119% +11 3% +1 4%, = 0 


as the ‘best’ relationship. 








148 Multivariate linear structural relations 


Equally, of course, this prime could have been obtained as that defined by the remaining 
three axes of the quadric, but this involves more computation. 

On the other hand, if the relationship is a line, then it is defined by the intersection of the 
first three principal primes of the quadric or by the remaining axis of the quadric. This last 
definition involves very little computation and leads at once to 


ly lye lis ls 


Where the relationship is a plane, it can be defined either by the first two primes or by 
the last two principal axes of the quadric. The computation is the same in each case and 
there is little to choose unless either the first two or last two roots of the matrix are near 
together, in which case the use of the spaced roots will avoid the necessity of carrying a 
large number of figures. 

In Example I above the direction ratios are 


Table 5. Principal axes, Example 1 











1 2 3 4 
$1 + 63-44 + 37-99 — 45-64 — 22-76 
de — 26-84 + 41-48 + 6-19 — 22-94 
ds + 4:33 — 87-09 + 12-23 — 158-77 
$4 — 34-49 — 12-65 — 59-14 + 1-44 























The relationship has been shown to be a plane and the ‘best’ plane is therefore defined by 
63-442, + 37-992, — 45-642, — 22-76x, = 0, 
— 26-842, +41-482,+ 6-19a— 22-942, = 0, 


The remaining two rows need not have been computed. After a little rearranging these 


become x, — 0-582, —0-02x, = 0, 


X_ — 0°23x, — 0-56, = 0. 


The ‘true’ equations in Example I were, after changing to co-ordinates referred to the mean 


as origin, 2, — 0-6 1a, —0-12%,—0-28 = _ 


(7-1) 
& — 0-272, — 0-602, — 0-21 = 0, 


showing good agreement with the estimated ‘best’ relationship. 


7-4. Test of theoretical relation under rational definition 


To test a theoretical relationship, either the value of ¢ may be computed (in the case of 
a prime) or the reality of the intersections with the acceptance quadric may be investigated 
(all cases). In the case of a prime or line, neither of these present any difficulty but in 
intermediate cases, it is not quite so simple. We use again Example I (the plane (7-1) in 
four dimensions) to illustrate the technique. 





act 


ning 


the 
last 


r by 
and 
near 
1g a 


1ese 


ean 


7:1) 





R. L. Brown anv F. FEREDAY 149 
Computation is simplified by changing the axes to the principal axes of the quadric and 
this is easily effected by a transformation of the type 


1 
1+ 4, +1 +0)’ 





Ly = (Ly Yy + La Yo t Lay Yat lad) TR 


where /,, are given in Table 4. 
This transformation of (7-1) leads to the new equations 
79-94y, — 38-65y, + 7-05y, + 1-25y,— 21-56 = 0, 
45-06y, + 62-08y, + 1-88y, + 1-99y,— 25°47 = 0. 
We may now make use of condition (5-12) above for every prime through this plane to be 
sa pagpene 4808-19 1545-07 | 


This leads to V= i\>0, 4804:19>0. 
1545-07 1764-00 | 


These conditions are plainly satisfied and the plane is therefore acceptable. 


7-5. Hxamination of subsets 


Finally, it is interesting to examine the subsets amongst the variates. Table 6 gives roots 
of the determinantal equations obtained when each variate in turn is omitted, so that we 
treat the experimental data as if we had only the measurements of three variates from 
amongst the four which satisfy the structural relation. It will be noted that the roots fall 
between those (Table 1) of the four variate determinantal equation. 


Table 6. Roots of determinantal equation for subsets among three variates 




















Example Roots 
I omitting x, 0-925 2-174** 54-46** 
Xs 1-333 5-044** 15-84** 
Le 0-941 4:210** 52-75** 
t, 1-183 5-133** 41-39** 
II omitting x, 1-234 13-69** 19-85** 
Xs 1-574* 5-64** 19-28** 
Le 2-204** 3-94** 18-95** 
2, 1-606* 17-04** 19-61** 
III omitting x, 0-784 1-348 128-4** 
Xs 0-640 1-334 128-3** 
Ly 0-533 1-225 53-56** 
x 0-712 1-020 112-6** 











Asterisks: *, 0:05; **, 0-01; levels of significance. 


In Example I we find that the plane (twofold) in four dimensions yields a plane in each 
of the three dimensions. In Example III the line in four dimensions yields a line in each of 
the three dimensions. But in Example II we find in three cases no relation (as would be 
expected) and in one case that a plane would be acceptable. Although this must be regarded 








150 ‘Multivariate linear structural relations 


as accidental it serves to illustrate the dangers of attempting to find the nature of a relation 
amongst m-variates from studies of the relationships between subsets of these variates. 


7-6. Qualitative methods of analysis 


If we knew nothing of the errors to which the measurements were subjected, we could 
calculate the partial correlation coefficients. It was found that in Example I, the zero, 
first- and second-order correlations between x, and x, were significant at a probability level 
0-001; in Example II, zero, first- and second-order correlations between x, and x, are also 
zero, first- and second-order between x, and x, were significant at 0-001; in Example III 
all zero order correlations were significant at level 0-02. Evidently Example "TI gives a clear 
pattern suggesting a connexion of each variate with each of the others. For the other 
examples the pattern is irregular and it is difficult to drawn clear-cut conclusions. 

The multiple correlation coefficients were also calculated. These may be regarded as 
giving a measure of the efficiency of regression equations of one variate on one or more of 
the others. In nearly every case the coefficients proved to be significant, suggesting that 
for each example there is an underlying relationship. Except in the case of Example UI, 
where every coefficient was significant at a probability level of 0-001, there were non- 
significant multiple correlations, so that again it was difficult to draw clear-cut conclusions. 

The method of confluence analysis has also been applied to these data. Here we compare 
two cf the multiple regressions, for example that of x, on x5, x, with that of x, on 2%, 2. 
Rearranging the regression coefficients, their scatter or proximity indicates the likelihood 
of there being present a true relation. The comparison is qualitative and the procedure is to 
plot all the possible combinations. It was found difficult to make any clear deductions in 
the cases of Examples I and II, but again Example III showed that all the pairs of zero 
order regressions were markedly similar, indicating a strong underlying connexion between 
the variates. 


The authors are grateful to Mr W. D. Ray who kindly checked the paper. 


REFERENCES 

BaRtTLeTT, M. S. (1934). The vector representation of a sample. Proc. Camb. Phil. Soc. 30, 327. 

BarTLETT, M. 8. (1937). Properties of sufficiency and statistical tests. Proc. Roy. Soc. A, 160, 268. 

BaRTLetT, M. 8. (1941). The statistical significance of canonical correlations. Biometrika, 32, 29. 

Brown, R. L. (1957). Bivariate structural relation. Biometrika, 44, 84. 

FISHER, R. A. (1938). The statistical utilization of multiple measurements. Ann. Eugen., Lond., 8, 376. 

FriscH, R. (1934). Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo: 
Universitets Okonomeske Institut. 

HorTretiina, H. (1933). Analysis of a complex of statistical variables into principal components. 
J. Educ. Psychol. 24, 417, 498. 

HOTELLING, H. (1936). Simplified calculation of principal components. Psychometrika, 1, 27. 

Hsu, P. L. (1939). On the distribution of roots of certain determinantal equations. Ann. Hugen., 
Lond., 9, 250. 

Hsu, P. L. (1940). On generalized analysis of variance. Biometrika, 31, 221. 

Mupcetrt, B. D. & Friscu, R. (1931). Statistical correlations and the theory of cluster types. J. Amer. 
Statist. Ass. 26, 375. 

Strong, R. (1945). The analysis of market demand. J. R. Statist. Soc. 108, 286. 

Tintyer, G. (1945). A note on rank, multicollinearity and multiple regression. Ann. Math. Statist. 
16, 304. 

TURNBULL, H. W. & Arrken, A. C. (1932). Theory of Canonical Matrices. London: Blackie & Son, Ltd. 









































R. L. Brown AND F. FEREDAY 151 
on 
ld 
i. APPENDIX 
vel Original data of constructed examples 
lso 
(a) Example I 
it = 
Dar | | | 
1er | i | "s | * | ie 
= x = : ee 7 Bika pees Woe 
as 1 | 48-42 | 28-02 | 35-69 24-93 
of 2 | 19-64 | 13-89 | 16-05 26-65 
3 | —11-72 12-39 —5-10 17-83 
rat 4 | 6-20 16-24 2-73 | 31-13 
II, 5 12-53 | 17-04 15-20 16-49 | 
_. 6 | 26-46 | 27-09 16-08 31-90 | 
ms 1 54-92 | 22-56 | 36-24 29-01 
8 31-26 | lll | 18-31 10-38 | 
are 9 | 14-12 | 26-61 11-06 34-64 | 
cm 10 9-29 3-58 4-32 6-54 | 
es 11 23-00 19-72 12-11 25-84 
12 47-78 21-28 | 31-61 | 8-59 | 
3 to 13 — 5-96 | 3-22 —4-13 18-23 | 
; in 14 23-94 3-40 9-17 7-12 
‘io 15 17-25 | 12-47 | 16-78 946 | 
16 28-42 11-37 17-05 30°77 | 
pen 17 0-06 | 7-89 —0-73 26-74 
18 37°33 | 17-36 31-68 22-14 
19 18-84 | 13-49 10-86 26-80 
20 47-45 | 23-17 33-26 3°55 
21 16-71 15-72 9-96 7-16 
22 49-96 | 29-13 37-76 25-38 
23 35°48 21-66 | 33-72 9-31 
24 20-40 | 13-88 | 10-32 23-29 
25 | 21-77 | 23-99 10-58 27-68 
ROT: am Ret che 
8. | | | 
: S.D. 4-78 3-96 | 2-10 4-78 
376. — -—_|___— ____— 1 —_t — 
slo: 
nts. 
ON, 
ner. 
tist. 
Ltd. 











152 ‘Multivariate linear structural relations 


APPENDIX (cont.) 
(6) Example IT 








| 
| xy x v3 x, 

1 35:36 29-26 28-80 31-42 

2 27-42 29-68 40-01 36°81 

3 27-20 30°15 — 2-45 9-55 

4 23-92 13-63 35-02 22-83 

5 23-42 30-40 0-63 19-42 

6 7:99 12-46 9-04 17-52 

7 6-90 13-30 30°51 16-27 

8 16-11 25-44 23-61 23°34 

9 21-90 17-63 19-82 23-73 
10 6°54 15-97 5°84 5:03 
11 22-80 34°81 18-82 29-82 
12 22-22 23-64 18-91 8-83 
13 14-46 19-41 — 17-10 — 1-46 
14 11-79 11-98 12-45 9-56 
15 8-61 5-61 16-55 15:47 
16 8-42 20-41 — 0-96 15-82 
17 6-71 8:77 27-43 10-06 
18 6-71 21-16 2-85 17°18 
19 9-06 18-73 —9-14 5-15 
20 5-69 9-87 12-84 9-41 
21 15-96 27-48 12-26 31-41 
22 27-37 21-54 5-29 4:18 
23 15-02 25-10 9-90 24-84 
24 21-03 36-70 — 16-49 7:26 
25 12-69 17-27 24-62 27-58 
S.D. 4:78 2-10 3°96 3°96 






































SCnmnourwhds = 


| S.D. 














R. L. BRown AnD F. FEREDAY 153 
APPENDIX (cont.) 
(c) Example IIT 

wi we 

vy | Le Xs 4 

| | 

16-37 30-95 | 27-65 19-11 | 

31-69 48-67 | 44-91 34-96 | 

20-96 29-35 26-63 24-53 

14-01 22-46 18-23 24-59 

20-56 25-29 26-89 23-57 | 

3-81 0-43 — 1-58 5-92 | 
35-00 62-99 57-98 49-07 

4-58 12-19 17-68 5:30 

17-89 24-74 15-36 17-96 | 

13-17 19-80 24-32 16-09 

24-23 43-99 37-38 30-83 | 

26-40 48-90 36-92 35-49 

35-38 56-90 48-81 39-20 | 

23-07 48-85 46-37 29-62 | 
33-68 62-22 55-88 44-48 
0-22 ~1-91 71-34 1-98 
—0-97 2-93 7-02 3-16 
18-66 32-60 27-27 20-41 
15-20 25-08 19-09 20-93 
16-11 28-51 19-44 16-29 
5-06 14:17 12-37 10-71 
11-51 23-69 12-70 7-18 
16-90 26-69 14-64 17-45 
3°75 5-99 6-81 2-06 
1-19 —0-39 — 3-72 — 2-65 
2-10 2-10 | 4-78 3-96 























[ 154 | 


MULTIVARIATE RATIO ESTIMATION FOR FINITE POPULATIONS} 


By INGRAM OLKIN 
University of Chicago and Michigan State University 


1. INTRODUCTION AND SUMMARY 


In sample surveys precision in estimating the unknown mean Y of a finite population may 
be increased by using an auxiliary variable X, which is correlated with Y, and whose mean 
X is known. Two such estimates are ratio and regression estimates. This paper is concerned 
with the extension of ratio estimation to the case where multi-auxiliary variables are used 
to increase precision. 

In the univariate case a simple random sample (2,, ,), ..., (5 Y,) from a finite population 
(X,,¥;), ---,(Xy,Yy) is observed. The mean X is known, and Y is to be estimated. The 
estimator 








g = 4 XarX 
g=5 A=r 
is called the ratio estimate of Y. In general 7 is biased, and for large n, approximations for 
Ei and V(%) are given by le yy N=n¥,, si 
y= Nn‘™ ay)? 
. N-—nY? 
Vij ) = N f n (Cox + Cyy — 2Cyy)> 


where ¢,, = Sz,/X2, Cy, = Syy/Y¥?, Cey = S2y/X Y, and S,, is the covariance between X and 
Y (Cochran, 1953, pp. 115-16). Hartley & Ross (1954) have shown that 





is an unbiased estimator of Y, where n7 = Ly,/x; (x, is assumed to be positive). 
It is easily shown that 7 is a consistent estimator of Y in the sense of Cochran (1953, 
p. 13), ie. 7-> Y as n—N, and also in the sense of Hansen, Hurwitz & Madow (1953, p. 74), 


ie. plim 7 = Y with the restrictions: (i) as n increases, N increases with n< ON, 0<0<1, 
n?>@o 


and (ii) Y remains constant as N increases. 
In the multivariate extension we have the following model. Population: 


Y,,...,¥y, ¥ unknown, 


Xi)---»Xyy, X,+O0known, R, = Y/X,, 


X,y--,Xpv, X,+0known, R, = Y/X,, 


and the (p+1)x(p+1) covariance matrix S is known. The subscripts 0, 1,...,, refer to 
Y,X,,...,X,, respectively; e.g. Poe is the correlation between Y and X,. Higher moments 


+ This work was supported by the Office of Ordnance Research, U.S. Army, and the Office of Naval 
Research and was carried out while the author was on leave from Michigan State University. 





wil 


NSt 


may 
nean 
rned 
used 


ition 


The 


s for 


and 


953, 
, 74), 
<1, 


er to 
Lents 


Naval 





INGRAM OLKIN 155 
will have superscripts referring to the variables and subscripts to the powers, e.g. 
4b = ~ ig lye et 


Finally, S;; = Nvij/(N —1) Pod the covariance and c; = S; < ; the coefficient of variation. 
The later development will be considerably simplified if we have a notation for moments 
divided by means, thus wij = nii/X,X?, ete. 

A simple random sample (y;,21;, ...,%p;) (j = 1,...,”), from the population is observed. 
The proposed ratio estimate of Y is 

j = wr, X,+...+0, oa (1-1) 
where w = (Wj, ...,W,), Zw; = 1, is a weighting function, and r, = 9/%;. 

As in the univariate case 7 is biased in general, and a large sample approximation for the 
mean, variance, and mean square error to O(n-*) is given in § 2. Because of the complicated 
form of the terms of O(n-*) and their dubious value, only terms of O(n) will be considered. 
The Hartley—Ross estimator can be generalized so that 


p as (N-—1)n p 
ye = . . . — = 7X. 
i = Een Ret Nem V-Sra) 





ion n 
is an unbiased estimator of Y, where n7; = > y;/x,;. 
j=1 


Consistency in the multivariate case (both senses) follows from the fact that we have a 
linear combination of consistent estimates. 

In §3, an ‘optimal’ weight function is considered, namely, that w which minimizes the 
variance. Some special examples are given. An estimate of V(f) is given in § 4, and in §5 
comparisons between mean estimation using simple random sampling and ratio estimation, 
and between univariate and multivariate ratio estimation are made. An example is dis- 
cussed in § 6, where the population consists of the number of inhabitants in 200 large cities 
in 1930 (the five largest are excluded), and a sample of size 50 is taken. Here Y, X, and X, 
are the 1950, 1940 and 1930 mean number of inhabitants. 

Even in the univariate case, if the population is stratified, several different ratio estimates 
may be constructed. Two such are (i) separate ratio estimate, and (ii) combined ratio 
estimate. Generalizations of (i) and (ii) are treated in §7, and in §8 asymptotic normality 
is discussed. 


2. MEAN AND VARIANCE 
From (1-1) we have Eg = Ydw,E(r,/R,), 
V(9) = Y?Xw,w, cov (r;,7;)/R; R;. 





In order to obtain approximations for Hr; and cov (7,,7;), we employ the usual delta method. 
at 65 = (%7-X,)/X;, €;=6, = (%,-X,)/X;, 
i=0,1,...,p37 =1,...,”. If |e;|< 1,1 = 1,...,p, then 
y  Y(l 
~= Ye ete a R,(1 + 9) (1—e;+e{—...) 


z, X,(1+¢) 
+ R,[1+ (€)—€;) + (€ —€;€9) + (E93 — €2) + (ef — 2 e,)] 
=Rf{l+a,+2,+¥i+6)- (2:1) 








156 Multivariate ratio estimation for finite populations 


Remark. If the x;; are positive, then |¢,;|<1; a detailed argument by Koop is given in 
(Sukhatme, 1954, p. 141). 
The following computations are easily made (e.g. see Sukhatme (1954)). 


He; = 0, (2-2a) 

nHe;,€; = nie wii, (2-26) 
N—n)(N—2n) ,, 

n* He, €;€;, il (W voi Ney wiik, (2+2c) 


N—n)(N?2+N-—6nN+6n2) ,.. 
n> Hey, €;€;€, = ( We 1) iv 2) av — ) hise 
3N(N—n)(N—n-1) 
(N —1)(N—2)(N-3) 








(n—1) [whi off + off off + off wif]. (2-2d) 


We note that the covariance matrix 





F N-nC 
EE , ..+)€p)’ (Eq; «++ €p) = . ee (2-3) 
where C = (¢,;):(p+1)x(p+)), Cy = 84j/X,X,; = off 


is assumed to be positive definite. 


2-1. Approximation to 0(n-*) 
Using (2-1), (2-2), and collecting terms to 0(n-*), we obtain 








1N- N-n)(N- ; 
BlrdR) = Say 08) + | Gy are) Rod 
3N(N—n)(N-—n-1) , 
+ OH —ay area ole ati 
= 1+b,/n+a,/n?, (2-4) 
cov (r;,7;)/R,R; = Ha,(a;+f;+y;)+Ha,(6;,+y;)+ £h,h;—EP, EB; 
_1N-n 0j 
° sn. j (42 - O - off + wif 


1 ((N —n) (N — 2n) 


n? | (N—1)(N—2) 
+ 2(wfj + wf} — wif)? — 2w}(w§ + w§ — wif) + wf{(w} — wf) 
N- N- 
Gan (3 — wf) (wd — w9f) + 2 V- ay wil 
Laaelats ' (2-5) 





[3(w3 + w§) (wo — wf — off + wif) 


+ wh{(w3 — w%j)] — 


If we define the vectors 6 = (b,,...,b,), @=(a,...,@,), and matrices A = (a,)), 


= (b;;):9 x p, then , , 
Ey = Y+ —wh’ «wa 





(2-6) 





I 
and 
ade 
O(n 
be ¢ 
for 


The 
is d 


Ing 


nin 


2:3) 


2-4) 





INGRAM OLKIN 157 


V(j) = = w(4 + a) w’ + 0(n-), (2-7) 
M (ij) = a w(4 + ne oe) + 0(n-4). (2-8) 


In the univariate case, Cochran (1940) has investigated the effects of the terms of O(n-*) 
and concludes that unless n is too small, the approximation to 0(n-!) may be considered as 
adequate. Because of this and for the sake of simplicity, we will use the approximations to 
O(n). It should be pointed out, however, that parts of the development can very easily 
be duplicated with the correspondence A + (B+6’b)/n instead of A in the M(g), A+ B/n 
for A in the V(#) and b+<a/n for 6 in H¥. We further note that : 


N-n 
b; = WV (C3 — po:eoe,), 





N-n,. ; 
a ae (Co — Poi loi — P05 0%; + Piz &ie)- 





The matrix A is the covariance matrix of (€) —€}, ...,€)—€,) and is equal to 7'C7", where C 
is defined in (2-2) and 


1 -l 1D: ss 0 
T= 1 0 -l 0 
1 0 ou. «=i 


Clearly, A is at least positive semi-definite. Since 7': p x (p + 1) is of rank p and C is positive 
definite, it follows that A is positive definite. 


3. CHOICE OF A WEIGHT FUNCTION 
The criterion for optimality of the weight vector w = (w,, ...,w,) with Xw; = 1 is to minimize 
V(g). To obtain the extremum, we make use of the generalized Cauchy inequality. 
where M is a symmetric positive definite matrix. The equality holds if and only ifxM = Oy, 
where 0+ 0 is a scalar. 
To apply the lemma, let e = (1, ..., 1) and make the correspondence x = w, y = e, M = A. 
Thus 1 = (we’)? < (wAw’) (eA-le’), 


and the equality is achieved if and only if wA = 0e or w = 0eA-". By the restriction we’ = 1, 
it follows that 6 = 1/(eA—e’), and hence the optimum w is given by 





_ ae 

0 = eA-te’ ey 
Insertion of # in (2-6) and (2-7) yields 
es «J eA-d’ 

“es 3-2 
dn’ n eA-le’’ sin 
a A ‘ 
Vig) = — ——.. (3-3) 








158 Multivariate ratio estimation for finite populations 


The bias is eliminated if eA-1b’ = 0. This will hold if b = 0, i.e. 
Cr= Polo OF FY = XipgS,/S; (i= 1,..-,p); 


which occurs when each regression taken individually is through the origin. Except 


for certain special cases, the expression eA-'b’=0 is not amenable to a simple 
interpretation. 


The weights will be uniform if and only if the column sums of A are equal, i.e. eA = ek, 
where k * Oisascalar. (k = Oimplies that A is singular.) HenceeA- = e/kandeA-e’ = p/k, 
so that @ = e/p. We also have that Hj = Y + Yeb'/np, V(gj) = Y2k/np. 

An example which results in uniform weighting is given by 

Cy =. = Cy=C, Por = ++» = Pop =P Pig = P(t +J)- 
Then a; = (N —n) (c2? —c? — 2pyegc)/N 
aj; = (N —n) (C5 — 2pycge + pe*)/N, 


b; = (N —n) (c?—pocgc)/N 





aie EG = tere : (c? — pycoc), 
yg =F omg (1 —P) + P(CE— 2ptye+ pe"). (34) 
If in addition, cy = c, py = p, then 
Ej = Y+—- 4 c2(1—p), 
vig) = AE = p(t), (3-5) 


More generally, the row sums of A are equal if 


p 
© Pili Poo (t = 1,...,p), 
fas 


are equal. 


4. Estimate oF (9) 


In the univariate case an estimate of V(j) is obtained by first noting that 


N 

3 (- RX, 
17 a a 
~n N-1 


which suggests the estimate uo(9) = 





For tl 
the fi 


Simil 


Thu: 


This 
esti 
O(n 


It i 
smé 


In 
Pig 


Th 


If 


pt 
le 


z, 
Ik, 





INGRAM OLKIN 159 


For the multivariate extension we rewrite the matrix A conformable to the above. Ignoring 
the finite population correction (N —n)/N, 








S i 
a3; = C5 — Poilo©i — Pog 0°; + Pig &;C; See SS 
t 


1 


~ (N—1) ¥' 1) ¥2 Ue ¥j—NY*)— R(X XuX—NX,Y)— R(T Xp%— NX, Y) 


+R R(X XyXp- NX,X,)] 
1; z (Y,-R,Xy) (%—- R;X it) 
ch yi 


Similarly, we can rewrite b as 





St Sg _ 1 ENN -RXe) 
= P = ee = we + ee 
be = 4 Pei%o% = a YY = Ya NaI 





> (% — 1; Xx) (Yp— 15%) 





Thus we estimate Y*a,, by os de =a, 
zuly WY, 1"; %u) y 
2 ° t=1 iceatcticiieried = 
¥%, by | — sb 


This permits V(7) = Y2/n(eA—e’) to be estimated by v(7) = 1 In(eA-e’). Similarly, we may 
estimate # if A is unknown. In general, these estimates will be biased, the bias being of 
0(n-1) (Cochran, 1953, p. 119). If the c; are smail, then the bias is negligible. 


5. COMPARISON OF PROCEDURES 
5-1. Mean estimation using simple random sampling 


It is known (Cochran, 1953), that univariate ratio estimation is superior (in the sense of 
smaller variance) to mean estimation, provided 


C 
Dy > 3. 
Vp 


In particular, if c, = c,, p,, > 4 yields superiority of the ratio estimate. If c; = c, Po; = Po, 
Pi; = p(t +J), the pertinent variances (omitting f.p.c.) from (3-4) are 


V2 
V(y) = rong 7 


V(9) = om *fo%(1 — p) + p(c2 — 2pyege + pc?)]. 
This leads to the criterion that the ratio estimate is superior to simple random sampling if 


a. 
1+(p—l)pe~ : 
If in addition py = p and ¢, = ¢, the criterion simplifies to 


> ° 
pt+l 








160 Multivariate ratio estimation for finite populations 


5:2. Univariate versus multivariate estimation 


In this section we are concerned with the consequences of using the sets of auxiliary 
variables 2, ...,%, OF 24, ...,Xp, p41, +++, %q, The result is given in the following theorem. 

THEOREM. Let V(j| p)| and V(j| p,q) denote the variances of 7 based on the auxiliary 
Wy, .+-,%, and 2,,...,%,q¢>—p. With optimum allocation #, V(7| p) > V(g| p,9)- 

Proof. With weighting w, we have 


LR Yy? 1 x my ; ge 
VO\P) = Tae VG|P.9) = = aaae’ A,= (3 op 


We must show that eA>'e’ >eA7'e’. The vector e = (1,...,1) should have an indicator p 
or q to denote dimensionality, but is omitted for simplicity. From 


ye A;1+DFD' -DF 
ne: —FD' | dae 


where F = (C— BA;'B’)|, D = Aj1B’, we have after simplification 
Az te’—eAs{e’ = # F(D'-—I)e’> 
eAgre —eAzte =el (D’—TL)e’>0. 


The latter follows since F is positive definite so that uF'w’ > 0 for all w. 
In the special case previously considered, the difference in the variance is 


= : 
Va|p)-V@| pa) == e-p) (42). 


6. AN EXAMPLE 


Cochran (1953, pp. 23, 113) discusses an example in which the population consists of the 
number of inhabitants in 196 large U.S. cities in 1920. We consider a similar example, using 
the number of inhabitants in the 200 largest U.S. cities in 1930, excluding the five largest, 
with Y = 1950, X, = 1940, X, = 1950 values. A simple random sample of size 50 was taken, 
the results of which are given in Table 1. 

In this example 1699 is the true value of Y. We now compare the various estimates. 

(i) Mean estimate 7 = 1896, o(7) = 2088f, where f? = (N —n)/nN. 
(ii) Ratio estimate, one auxiliary variate 
1896 


9 = rX = Ta5g 1482 = 1660, (9) = 289/. 


(iii) Ratio estimate, two auxiliary variates with 


(a) true weights g = 2r,X,—7,X, = 1681, o(¥) = 2-7f; 

(6) estimated weights = 2-38r,X,—1-38r,X, = 1689. 

Since c, is close to c,, and p,,,, = 0-987, it is clear that ratio estimation is superior to mean 
estimation. Similarly, ratio estimation with two variates is preferable to ratio estimation 
with one variate. 








ary 


1g 
it, 





InGRAM OLKIN 161 


Table 1. Numbers of inhabitants (in thousands) in a random sample of 50 large cities 
in the U.S. in 1930, 1940, 1950 


























1930 1940 1950 1930 1940 1950 
670 | 672 677 260 268 326 
| 104 101 116 68 69 79 
| 50 64 95 59 56 56 | 
| 292 385 593 451 456 504 
130 173 204 116 117 131 | 
55 54 58 58 59 66 | 
102 97 130 328 325 332 
54 58 70 781 771 601 
52 62 87 100 101 97 
| 71 69 68 57 54 55 
| 
55 50 51 106 112 125 | 
900 878 915 156 152 163 
| 47 48 53 578 587 637 | 
79 82 84 75 78 91 | 
50 49 54 63 65 14 | 
| 115 115 112 105 108 117 
55 57 60 51 46 58 | 
113 110 109 46 51 80 
65 70 82 195 194 203 | 
64 62 63 364 387 427 | 
65 67 74 102 101 102 | 
46 49 56 114 111 121 | 
148 203 334 63 63 64 
115 110 113 308 319 369 | 
| 62 71 70 54 59 74 | 
| 





The following is asummary of the pertinent results for the population and the respective sample estimates. 
































Population Sample 

Mean/100 s.p./100 Mean/106 s.D./100 

Y 1699 1740 1896 2088 
x 1482 1554 1693 1932 
x. 1420 1509 1643 1931 
1-049 1-059 1-056 1-213 1-241 1-256 
Cc 1-098 1-108 1302 1-335 
1-131 1-381 

A 0-029 0-042 0-033 0-051 

0-068 0-082 

b (0-039 0-075) (0-061 0-125) 
rf (2 —1) (2:38 —1-38) 


II Biom. 45 








162 Multivariate ratio estimation for finite populations 


7. STRATIFIED SAMPLING 
We first recapitulate the theory for a single auxiliary variate where the population consists 
of g strata. Two procedures are usually considered: 

(i) A separate ratio estimate of Y is made for each stratum and then combined. 
Specifically 7 ee 
4s = WY es a ? 

es | ser 
where 7 =x (j = 1,...,g). 
This estimate is called the separate ratio estimate. 


(ii) The conventional stratified sample estimates of Y and X are made, and a ratio of 
these is formed. Specifically, 


~ Yay 

Ye tia. = 
where Ta = By +... + Tag, 
a,— Mans... Meo, 


This estimate is called the combined ratio estimate (Hansen, Hurwitz & Gurney, 1946). 
For each procedure, the determination of an optimum allocation of the n; can be made. 

We now consider the generalization to the multivariate case. Some of the previous theory 
carries through unchanged, but generally, some modifications will be required. We use a 
notation similar to that employed for the unstratified case, except that the stratum under 
consideration is denoted by a superfix, e.g. Y“ denotes the mean Y in the jth stratum, 
X‘) the mean of X;, (i = 1,...,p), S® the covariance matrix within the jth stratum, and so 
on (j = l,...,g). 


7-1. Separate ratio estimate 
Let J = WIXP PP +... -WIXP YD, 
Pp =>: e 
¥ wi? = 1 (j = 1,...,g), be a ratio estimate of Y®, the mean of stratum j. We now forma 
i=1 


linear combination of the strata means 


as an estimate of Y. 

The g weight ectors w) = (wi), ...,w) (j = 1,...,g) are chosen to minimize V(j,). 
Since the components of 7, are uncorrelated, V(f,) is additive and we may minimize V(j”) 
for each j = 1, ...,g. Thus for this case, the previous results remain valid for each component 
of 7,. The results for the mean and variance of 7, are then obtained by combining the results 
for the components in an obvious fashion. 

The optimum allocation of n,, ...,n, is determined by minimizing V(y,), subject to fixed 
cost, i.e. La;n; is constant. The result is easily shown to be 

N,Y® 
7 Ja) NeAO=He"" 
é] 








Th 


wh 


wh 


Si 











Ina@RAM OLKIN 163 
7:2. Combined ratio estimate 
sts The usual estimates appropriate to stratified sampling are 
ed, 9= s Ny |N, 
j=1 
9 ? 
2,=>DNAP|N (i =1,...,p). 
j=1 
The combined ratio estimate for the multivariate case is the linear combination 
$=H2Es+ thy eX Lh=h 
. of zy Xp 
where f = (jf), ...,f,) is a weighting vector. Let 
7) = YOu Ve), 
HP = XV4X,c), 
where Ee\)) = Ee) = 0, V (el?) = SYD) Y2n,, 
V(eP) = SY/Xin,. 
6). 9 ENY XUN (Y+ Ye) 
As before | Pe Moe! «nM At. H 
; a ENgP~ EN(XP +X.) 
a! = R(1+=N,e/N) (1—-ZNje\/N +...) 
der = R[1+ =N,(6? —e/)/N 
im, + (ENE) (EN —ef?))/N*] 
7” =Ri{l+af+ 671. 
Here a* and f* take the role of «; and f; of §2. Clearly Haj = 0, so that 
BG, = Xf,X,R{1+ LPF) 
= Y+Y>/,Hf?, 
ests where Eft = IN, N, Hee — ef?) N? 
Mii... oe 
ie 
; =z NE Ep’. 
J.)- 
Pu) 
”) Similarly we obtain cov (2 : 4 = R,R, Haj aj, 
lent 4,’ &; J 
ults N21 
Eat at = Win. (ch — {i oh cl) — lk lh oH + pH cf clk) 
xed . 
NE] p00 
Hence Vg.) = YX, fj; Bot oF. 
II-2 














164 Multivariate ratio estimation for finite populations 
If we let A = (a,;):pxp, a;; = Eafaf, b = (Ebf,..., Hb%), 


then V(g,) = Y2fAf’, #7, = Y[{1+,fb']. Formally we have reduced the problem to the original 
framework, so that the previous theory remains valid, e.g. if we wish to choose the weight 
vector which minimizes V(g,), the result is given by f = eA-1/eA—e’. 

The problem of optimum allocation of n,,...,n, subject to Xl,n, =1, for any weight 
vector f is easily obtained, namely, 


IM, = Sif, j Ea? a) Vl. 
i,j 


Nn, = : ——. 
. EN RSS HaP ah 
7) 





However, it is somewhat involved to obtain simultaneously an optimum weight vector 
and optimum allocation, since this involves maximizing e(A®/n,+...A/n,)-'e’ subject 


8. ASYMPTOTIC NORMALITY 
We first consider the result for an infinite population. Let 





xy °i, ” 
Hy =H(Y,X;, ,X,) = Y, 
oH 
H, =a = a 
° oy ¥ 2; 
oH Ripe 
a = egies 2 (1,9 = 1,...,). 





Cramér (1946, p. 366) shows that if H is continuous and has continuous first and second 
partial derivatives in some neighbourhood of the point (9,%,,...,%,) = (Y,X,,...,X,), 
then H is asymptotically normal with mean H,, = Y and variance 


Y? at ‘ Pp Pp 
- a at3 14-2 VPoiloswzt+ YX w, especie, , 
n 1 1 i+j= 
which is the same as (2-7) to 0(n-!). Since we have assumed that %;+0 and y and x; have 
finite variances, the above theorem can be applied to H = 7. 





We now turn to the finite population case. Let U™, U®,...,U™),... be a sequence of 
universes, where the elements of UY) = (Uf, UY, ..., UM) are (YY, XY, ..., XY), and 
suppose that the U) satisfy a certain regularity condition. Let 

ym — Hy™ %,— EX” 
po), OP -w  ! bal 
. a. ae 


and suppose lim p{)= p,;> —1+e (i+j = 0,1,...,p), then the limiting distribution of 
(29, 2,---,%,) 18 multivariate normal with zero means, unit variances and covariance 
matrix (/;,;). 





inal 
ight 


ight 


ctor 
ject 


ond 
X»)» 


lave 


e of 
and 


n of 
ance 





INGRAM OLKIN 165 


This is essentially Theorem 3, Madow (1948, p. 544). The regularity condition is his 
condition W (p. 539), and is satifised, for example, if all the elements of U™ are uniformly 
bounded. Madow’s Theorem 3 yields the asymptotic normality of means, when the sample 
is selected from a finite population without replacement. This theorem plus the theorem 
of Cramér mentioned previously yields the asymptotic normality of 7. 


REFERENCES 


Cocuran, W. G. (1940). The estimation of the yields of cereal experiments by sampling for the ratio 
of grain to total produce. J. Agric. Sci. 30, 262. 

CocHRAN, W. G. (1953). Sampling Techniques. New York: John Wiley and Sons. 

Crammir, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 

Hansen, M. H., Hurwitz, W. N. & Gurney, M. (1946). Problems and methods of a sample survey 
of business. J. Amer. Statist. Ass. 41, 173. 

Hansen, M. H., Hurwitz, W. M. anp Mapow, W. G. (1953). Sample Survey Methods and Theory, 
Vol. m1. New York: John Wiley and Sons. 

Hartiey, H. O. & Ross, A. (1954). Unbiased ratio estimators. Nature, Lond., 174, no. 4423, 270. 

Mapow, W. G. (1948). On the limiting distribution of estimates based on samples from finite universes. 
Ann. Math. Statist. 19, 535. 

SuKHATME, P. V. (1954). Sampling Theory of Surveys with Applications. Iowa: Iowa State College 
Press, Ames. 








[ 166 ] 


NON-RANDOMNESS IN A SEQUENCE OF TWO ALTERNATIVES 
I. WILCOXON’S AND ALLIED TEST STATISTICS 


By D. E. BARTON, F. N. DAVID anp C. L. MALLOWS 
University College London 


In a recent communication (Mallows, 1957) ranking models were put forward as alternatives 
to randomness using the approach of paired comparison theory. We discuss here an extension 
to one of them, a modification of the Bradley-Terry model called by Mallows the ¢ model, 
as an interpretation and alternative to randomness in the case of a sequence of two alter- 
natives. Several test criteria are compared under the ¢ alternative. 

The ¢ alternative was developed from the method of paired comparisons and it is this 
method which we use here. It is assumed that there is a random sequence of two alternatives, 
r, of one kind (x), r, of another (y), with r, +7. = r. This sequence may be assumed to have 
been arrived at by each x having been compared with each y and also with every other x and 
similarly for y. Thus "C, comparisons will have been made, all independent. If, following 
these comparisons, an inconsistent sequence results (for example, the first x picked up may 
be judged y, <x, <¥Y_, while the second x may be judged y, <2, <y;,), then the sequence is 
discarded and further "C, comparisons are made. This procedure may be supposed to be 
repeated until a consistent set of results is obtained. As was emphasized by Mallows in his 
earlier paper, it is not suggested that the person undertaking the ranking does perform an 
experiment in this fashion, but rather that the mental process which he goes through leads 
to results which are equivalent to those obtained from the model. 

Various assumptions can be made for the ¢ model. We give here three, all of which lead 
to the same mathematical set-up. It can be supposed : 

(i) There is no difference in the r,+7r, elements as regards the characteristic whereby 
they are being ranked. Under the null hypothesis all arrangements are equally likely. 
Under the alternate hypothesis the probability that an x is ranked below a y is p+ 4. This 
situation might arise, for example, if a person is given a set of r photographs of r, men and 
r, women, all of the same age. The judge, asked to arrange this set in order of increasing age, 
will, if he is without bias, produce a random sequence. If, on the other hand, he has an un- 
conscious bias, he may tend to consider the women of lower age than the men and in any 
comparison between the two have a probability > 4 of ranking the woman below the man. 
Comparisons between women and between men may still be assumed random. 

(ii) All the r, elements may be assumed to have the same common rank which is less than 
(or greater than) the common rank of the r, elements. The null hypothesis will be that there 
is no ability to discriminate between the two ranks. Under the alternate hypothesis there 
is a probability of p+} that an x is ranked lower than a y, but as before, the arrangement of 
the x’s and of the y’s among themselves is random. For example given r, photographs of 
men all the same age and r, photographs of women all the same age, but a different age 
from that of the men, with a request that the photographs be arranged in order of age, a 
judge without competence would place the r photographs in a random order, whereas a 


skilled judge has a probability p in all comparisons between men and women of getting the 
order right. 





tl 





D. E. Barton, F. N. Davip ann C. L. MALLows 167 


(iii) A variant on model (ii) is to assume an absolute underlying ranking for all r,+7, 
elements, with all the x’s ranked less than all the y’s. Under the null hypothesis the ranks 
are randomly allotted. With the alternate hypothesis every x is compared with every 
other x, every y with every other y and every x with every y, it being assumed that in each 
of the independent ’C, comparisons there is a probability p (+ }) of getting the two elements 
in correct order. Thus we assume, for example, a series of photographs of men and women 
all of different ages. No competence on the part of the arranger in judging age results in 
a random sequence. The more competent the arranger, the greater will p be. 

Suppose that there are r, elements (a) and r, elements (y), 7; >72, all ranked. Let S be 
the sum of the ranks of the 7, y’s and, following Mann & Whitney (1947), let U be the number 
of times an x is ranked above a y. We have that 


U = 47,(r2+1) +7, 72.-S8. 


Let p be the probability that an x is ranked belowa y. Then, if s,, 89, ...,8,,, are the individual 


ranks of the y’s, 


rT? 


r 


P(8182 778 8,4) o prir-UgU oc pU oc bo e% = ¢*, 


where ¢ = q/pt. It follows that U (or S) will be sufficient for p under the set of ¢ alternatives. 
Thus if P,(U) is the probability of obtaining a given value of U under the alternate hypo- 
thesis and P,(U) is the corresponding probability under the null hypothesis 


PU)" 


FO) = Spy ge" 
U 


Further, if the probability generating function (p.g.f.) under the null hypothesis is 
G,(t) = LAU), 


then the p.g.f. for the non-null model is 





G,($t) 
G,(t)oc & P(U)t7g¥ = +. 
soc ER(UNTG = 
For sequences with r, +7, small the exact frequency distribution of S has been given by 
both Mann & Whitney (1947) and Haldane & Smith (1948). It may be calculated most 
easily perhaps by means of a generating function. If we consider the coefficient of ¢’: in 
the expansion of née 
I] (1+h%t) 
i=1 
this will be a polynomial in ht. The powers of h will be the different values that S may take 
and the numerical multiplying factor will be the frequency. For example, for r, = 4, 
r, = 3 we have ’ 
I] (1+h't), 
i=1 
and the coefficient of ¢? is 
WS +h? + 2h8 + 3h9 + 4h! + 4h1 + 5h}? + 4h33 + 4h'4 + 3h + 2h16 + hi! + his, 


+ Previously Mallows defined ¢?=q/p. 
{ Explicit expression for this polynomial as a rational algebraic function is given, for instance, by 
P. A. McMahon in Combinatory Analysis, 2, Art. 248. 








168 Non-randomness in a sequence of two alternatives. I 


The distribution of S under the null and the alternate hypothesis can be summarized as 





























eine eS | 
‘a 6 7 8 | 9 | 10 | ll | 12 | 13 14 | 15 16 | 17 18 Total 
| U 12 11 10 9 S po? | 6 | 5 4| 8 2 1 0 
| | | | | | | 
| | yuma BS gbagey | 
| fo(S) | 1 1 Saab ame we ae 4 | 4 | 1 | 35 
| fulS) | p* | o™ itt 39° | 49° | 49" | 5g8 | mil — as | 26 24 ‘| 6 1 | DfolS,) o8-% 
| | | EP | 








Given the critical region for S, therefore, the power of S with regard to a given value of ¢ 
may be calculated. Table 1 (p. 178) gives the power of S for several values of ¢ for the random 
sequences (5, 5), (6, 4) and (7,3). 

When r, and r, become too large for the p.g.f. to be manipulated easily, other methods 
based on moment approximations are possible. The cumulant generating function of 8 
under the null hypothesis will be 


K,(h) = log  coctticient of ¢: in ll (1 +temyrc,, | 
j=1 
h2 3 , 
= Kyh+k5)+Ksajt 
Write 6 =log¢d 
whence the cumulant generating function under the alternate hypothesis is 
K,(h) = Ky(h+8)—K,(6). 

Expand both sides of the equation and equate like powers of h. If x,(¢) denotes the vth 
cumulant of S under the ¢ alternative, and x, the vth cumulant under the null hypothesis 


we have $2 53 54 55 
K,(9) = Kyt OK ys v 2 Kye t 3) Kv+s + 4! Kota +5 Kyp5 t eee 


The first 12 cumulants under the null hypothesis have been tabled by Haldane & Smith 
(1948). The lower order of cumulants given by them are as follows: 
=$l{rt+1), Ky .=9 (v21), 


rT)? 
Kk, = 1 hd r+1), 


_ P(r +1) 
K,= 1 -_— r(r+1)—r,7re], 
1 
ih, = nates un (r+ 1) (2r2 + 2r — 1) —r(4r +5) 747 + 27273]. 


Koy Will be of order (2v + 1) in 7 so that K,,/(K,)” will be of order —(v—1) in r. (This indicates 
the rapid approach to normality of the distribution of S with increasing r and r,/r, fixed.) 
Under the alternate hypothesis we have 


§2 4 
Koy(P) = Koy + 9 Kevte + gy K2v+s FH eee, 


63 
Kay+($) = OKay42+ ay 3 Kavra t +++ 


3 
while K,(¢) = Kit Ong + kt : 





ful 
gin 
th 
th 
un 


th 


SiS 


ith 


tes 
d.) 





D. E. Barton, F. N. Davip anp C. L. MaLLows 169 


Using the moments of S under ¢ two procedures are possible. We may take a suitable 
functional form and using as many cumulants as desired estimate the power of S under the 
given @ alternative. A second, and simple, procedure is to take advantage of the fact that 
the distribution of S becomes quickly approximately normal with increasing r. We may 
therefore assume, for small departures of ¢ from unity, that S is also normally distributed 
under ¢ with mean and variance 


K,($) = 47,(r7 +1) +7567, 7.(r + 1), 
Ko() = 7'57172(7 + 1), 


respectively. The power of the test may be calculated in the usual way. Because these 
approximations are adequate for ¢ not very different from unity we do not discuss here 
different approximation series which may be obtained for the moments as functions of ¢. It 
is clear that several variants of the procedure which we have described are possible, chiefly 
through the expansion of 6. 

Instead of S there are several"other criteria which have been proposed for testing for 
randomness in the sequence. We consider here two tests which might be applicable in the ¢ 
situation which we envisage, namely, Mood’s median test (Mood, 1950, p. 395) and the 
runs test. Since S is sufficient for ¢ neither criteria can be more powerful than S, but it is 
of interest to compare their power curves under the same set of ¢ alternatives. We take 
Mood’s median test first and for the sake of example assume an even number of elements 
in the sequence. The discussion is easily paralleled for an odd number. The r (even) observa- 
tions are ranked, divided at the median and a 2 x 2 table drawn up. 














Sle T=1T1+1%, Number of 2’s Number of y’s | Totals 
Below the median dr—b b | }r 
Above the median b+4(r,—72) r,2—b 3r 

. | | 
Totals "; | x r 
| 











Under the null hypothesis it is immediate that 





"20," Cy,» ¥O,PC,,-» 
p(b) = ee = — 
Ch, 24 
v) v 
and E(b™) aise ry og ) 


The critical region is usually taken as the sum of both tails of the distribution. 
The power function for b can be built up from a consideration of the bivariate distribution 
of S and b. We take the generating function 


ar tr 
hater T] (1 +h't) T] (1 +hiw). 
i=1 j 


j=1 








170 Non-randomness in a sequence of two alternatives. I 


The coefficient of tw": multiplied by h-4"" will, for a given value of b, generate the frequency 
distribution of S for this fixed 6. For example for the sequence 7, = 4 = r, we take the 
generating function 4 P 
AX TT (1+ hit) T] (1 + h/w) 























i=1 j=1 
and obtain the bivariate table: 
| | jel ae eS) | ee ar, 

S|} 10) 11 12/13/34) 15 | 16/17/18 19 20 | 21 oi a 24) 25 | 26 i 
b\U| 16 | 15 | 14 | 13 adh 10| 9 8 | 7 6) 5 | 4 | 3} 2| 1] 0| = 
| | | 
i Ea EES bite wk a pamela Tee, ec See: Wades Mr crag 26! RE Sy 
| | | | | | | 
0 -[-]-]- wee hess eee Sweet. 
1 Pe OR oF PERL RL ARSE Mee | 16 
2 . Paes, Bie sjejs/2/i]. | 36 
3 -f}1/}2]}3/]/4/3)}/2]1 cif NV ae Aare 16 
4 1 ) ae Goes Mag eo ee ee 

| | } | | } 

| Eee ee eh ge! Sin ale ow 
erases nares eames coors re re) ae sare its = S9 | GRR RT| BE Se aad a es EE SEE 
Ravel et tds Saal ro ed sat) fade Rag esl Seah a te 1} 1 | % 
[ | | | | | | “4 




















For brevity let the zero suffix denote the null hypothesis distribution. When r is even, 


Po(S | b) = po(S | (r,—5)), 


so that the symmetry of the table in the example above is a general result for even sequences 
and a median dichotomy no matter what the composition of the sequence. The distribution 
of b under the ¢ alternative is easily written down. 


p(b | H,)x 2 Pol. 6) d-*f(p) x 2 Pol) Pol'S |b) dS ac 2 Pol |b) é-*. 


The required probabilities for 6 under the alternate hypothesis are therefore proportional to 
the coefficient of t?w~ (b = 0,1, ...,7,) in the expansion of 


4 4 
pdraltatD+ryry-2S-+407,—b) Il (1+ p't) il (1+ dw). 
i=1 j=1 


This means that each S array in the bivariate table is multiplied by the weights which are 


given to the marginal totals of S and the table is added along the b arrays. For example, in 
the illustrative table 


P{b=2| pha J4 + 24 + 5d + 67 + 8A + 6° + 5G! + 2411 + B12, 


and similarly for the others. The distribution of b under the alternate ¢ hypothesis was 
calculated numerically for one of the sequences (5,5) chosen for S. Some difficulty was 
experienced in making a direct comparison of the power because of the limitations imposed 
by S and 6 being discrete with the consequent inability to choose the same size for the 
critical regions. In order to make direct comparisons it was assumed, for purposes of cal- 
culation only, that S and 6 were distributed rectangularly from } below the discrete value 
to 4 above. The small number of values which b can take precludes any comparison with S$ 
for sequences (6, 4) and (7, 3). 





ant 


inj 


thi 





\cy 
the 


ces 
ion 


| to 


are 








D. E. Barton, F. N. Davip anp C. L. Mattows 171 


The regression of 6 on S in the null-hypothesis bivariate table is disjointed, but the 
regression of S on bis linear. If the dichotomy of the sequence is made between the Rth and 
the (R + 1)st elements of the sequence then 


&(S |b) = —hbr+4r.(r+R+1) 
and 
o%(S |b) = Af —b°(R+1)—(r.—b)2 (r—R+1)+0(R)(R+1)+ (rp—6) (r—R) (r—R+ V}. 


For the particular case of a median dichotomy that we have been considering r = 2R and 
&(S |b) = —4br + }r,(3r+ 2), 
a(S |b) = dy(r + 2) (40(r.—b) + 7,(7, —12)). 


The same argument can be used here as we used for finding the cumulants of S under ¢, 
and we have, remembering 6 = log ¢, that 


&(S |b, d) = —hbr+ 4dro(r + R+1)+ ',d{0(R + 1) (R—5) + (72-5) (r-—R+1) (7, -R+5)} 
in general and in particular 
&(S |b, 6, r=2R) = — fbr + 4r,(3r + 2) + Ayd(r + 2) (40(r. — 5) + 72(7, — 1). 


To this degree of approximation the variance remains the same as in the null case. It will 
be noticed that the regression is approximately quadratic for 6 small. As d increases the 
higher order cumulants will play a part and the divergence from linearity will become more 
pronounced. 

Under the null hypothesis with the split R+ R = r, since 





of = 172" +1) : 
- 39) °C 8 M1) 


the correlation between S and b is 
r 3 $ J3 
po =-3 (Fanon) 7s 
as r increases. The correlation between S and 6 under the alternate hypothesis does not 
appear to have any meaning (because of the non-linear regression). We would expect, 
however, given the high correlation between b and S in the null case that b would be nearly 
as efficient as S, which is not however the case. 
Wald & Wolfowitz (1940) suggested 7’, the number of runs in the sequence as a criterion 


to be used to test the randomness of a sequence found by two samples. For our present 
problem we have that 


PAT = 2t} = 22-71G_,"1C,_4/'C,,, 


(r— 2t) 
P{T = 2t+ 1} = P{T =28} va (¢ = 1,2, ...,%), 





S and 6 can be used for one- or two-tailed tests, but it is the two-tailed tests with S orb 
which will be comparable with the one-tailed test with 7'. The bivariate distribution of T 





172 Non-randomness in a sequence of two alternatives. I 


and § can be enumerated quickly for small sequences. For example for 7, = 4 = r, we 
have 

















| | | | | | 
16 17 18 19 | 20 | 21 29 | 93 | 24 | 25 26 hotell 
10| 9 8 od da faaed Wea ond ist Si q | 
fe Ob i aie Orn eo 
hh ae 
: raph w 1 2 | 
> ide oe ae a be hit 6 | 
2/1 Bh ROWe ME he we 18 
4|/2/2|2)] 4] 2) ; 18 | 
ES ee pia é PE Se | 18 | 
» ) )» | 
2 y- 2 : . | | | a 
8 ! l 2 | 
| | 
| | ot | 
Total I ! 2 eee tad me ; 7 8 7 7 BSR has) ae 
| | 





To find the probability distribution of 7' under the ¢ alternative we weight the S arrays as 
before and add up the 7’ arrays. Thus, for example, 
P{T = 5} oc 25 + 448 + 247 + 248 + 269 + 4419 + 264 

and so on. It will be seen on referring to Table 1 that 7’ is inefficient compared with S for 
the three illustrative sequences. Under the ¢ alternative the power of 7' does not vary very 
much with the composition of the sequence. It is clear from considerations of symmetry 
that &(S | 7’) is constant whatever 7’. The regression of 7’ on S does not appear to have an 
easily calculable form. 

The three criteria S, b and 7' whose powers we have discussed against a ¢ alternative were 
originally proposed to test for possible differences in location of two populations by means 
of tests for randomness in the sequence of two alternatives. It seemed interesting to us to 
try to vary the ¢ model under the alternate hypothesis and, by so doing, produce criteria 
which might be used also as tests for dispersion differences in two populations. Let us suppose 
that there are 7, x’s and r, y’s and that the latter fall into two classes, I and IT, of 1 and m, 
respectively (1+m = r,.) We further suppose the true situation is that the / y’s should all 
be ranked below the r, x’s which themselves should be ranked below the m y’s. Each z is 
compared with each y and we assume that in all comparisons there is a probability p+} 
that the ranker correctly assigns relative ranks to the pair, the | y’s always being ranked 
with certainty below the m y’s. As before we suppose that the result of the comparisons is 
a consistent sequence. Let C be the number of paired comparisons the elements of which 
are in the correct relative order. Then the probability of any sequence with C correct is 
proportional to pegntsCoc g-©, 
where ¢ = q/p as before. Now C will be equal to the number of 2’s ranked above the y’s 
of Class I plus the number of x’s ranked below the y’s of Class II. The number of «’s ranked 
above the y’s of Class I will be equal to 7,1 minus the number of x’s ranked below the y’s of 
Class I. We may write this as 

"1% —C =r, l—u,+U, 


or C = 17yM—Uy+ UY. 








D. E. Barton, F. N. Davip ann C. L. Mattows 173 
Now u, = $U(l+1)+7,1-S,, 
where S, is the sum of the ranks of the / y’s, and 
= 4m(m+1)+7r,;m—(S,—lm), 
where S, is the sum of the ranks of the m y’s, so that 
C = S,—S,+ constant. 


It follows that the difference between the sum of the ranks of the upper m y’s and the sum 
of the ranks of the lower / y’s is sufficient for ¢. 

It will be recognized that the ¢ model of the previous section will lead to a situation in 
which the y variable seems to have a greater dispersion than the x variable. The criterion 
using the rank differences will be, therefore, a linear test for differences in dispersion between 
two variables. For / known the test reduces to finding the difference between the sum of the 
upper (7, —/) and of the lower / ranks of the y elements. If we write 


S* = S,—S, 


then the generating function for S* is the coefficient of ¢’w™ in 


Er Tl (14th-*) [I (1+ whi). 


a t 
i<L<r-—m j=L+1 


Thus for a sequence of r, = 4 = r, and / = 2 the distribution of S is 














Ez 4 5 | 6 q 











ns" 5 8 13 | 14 | 14 8 | 5 2 














Asmall value of S* denotes possibly that the dispersion of the y’s is too small, a large value 
of S* that the dispersion is possibly too big. The power function is obtained by weighting 
f(S*) proportionally to -S*. Thus, in the example immediately above the distribution of 
S* under the ¢ alternative is, 


ba fe hate 








} | 
bf 9 | 10 | 11 | 12 
| 

14¢5 | 








| Jot") 5d" | 8g" | 138 14g | 8 | 598 








the actual power function depending on the size of the critical region. Calculation of this 
power function for a sequence of composition (5, 5) is given in Table 2 (p. 178). 

The p.g.f. of S* with increasing sequence length quickly becomes difficult to manipulate. 
We therefore follow the usual procedure and investigate approximations to the distribu- 
tion of S* through its moments. These may be obtained by an appeal to first principles. 
Let 2,2, ...,2, be the ranks of the y’s in ascending order, and accordingly 


l Ts 
S* = D2,4+ Ps Rie 
i=1 i=l+1 








174 Non-randomness in a sequence of two alternatives. I 


The problem is therefore one of finding the moments of the ith largest in a sample of r, 
drawn from a finite population of the first r natural numbers. We have that 


Ple,=R,} = B10,_,"-2C,,_"C,, (i< Ry<r—r, +i), 


Te-t 


P{(z,= R,) (2, = Ry)} = ™ 10, B1C,_,_,"- C0, _,/"C,, 


i<j, Ri<R, Ritj-t<R.<r—rzt+j (iS Ry, <e—7, +i) 


and so on. It is easy to see that 


, : 1 
é(z;) = ? &(z) ag TE eee tr +2)) 





i(r+1) : 
é 2.i\is3 ——_._....__. 4 2 q 
MP ~ Gar Gata) TIF 
from which we have 
2 _ (7 +1)u(r,—71+1) r(r+1)i(rg—j7 +1) 


6 ate a) * SKE OMG FEN erm ege ay” 








(stn) 


ig = 
i<j 


The mean and variance of S,, S, and S* are accordingly 








1 (+12 
(8) =, 08) = ttre te Hee—14 
i r+1 
(8%) = 7 (rg 1) (rg—D-HU+ 1 Hry—L4 1) 
2 
oud ae r,(r+1) {hro(1+ 1) 4 4(r,— 2) (1+ 1) —43(1+ 1)}, 


ri (72+ 1)? (r+ 2) 


ot= r,(r + 1) 
2 r+ IP (+2 





) {Bra("2— 2+ 1) + 3(r'g— 2) (72-1 + 1)® — P(rg—1+ 1) 9}, 


+1 
Poe Ae Gaara U+ WO YM, 
Oe mr ogg) HMC I+ (F414) + Hr 2) (419+ (F-14199) 


—}((1+ 1) + 2(1 + 1) (72-14 1)® + (rg —1 4+ 1))}. 


The vth cumulant of S*, x,, is of order y+ 1 in r,, so that the standardized vth cumulant 
is of order 1—}y in r,. For r, a fixed proportion of r, with increasing r, the standardized 
cumulants of order greater than 2 will tend to zero and we may therefore expect the dis- 
tribution of S* to be approximately normal. To illustrate this we set out the true distribution 
of S* for a sequence of composition (5,5) with 1 = 3, and the corresponding frequencies 
obtained from a normal curve which has the mean and variance of S*. 








D. E. Barton, F. N. Davip ann C. L. Mattows 175 


of , ; pee eect) 
. Comparison of exact and normal frequencies of distribution 


of S* for a sequence of (5,5) with l = 3 





S* —2;-1] 0 1 2 3 4 5 6 7 8 9 10 | 11 | 12 | 13 |Total 














fs) |1 [2 [5 |9 |16 | 25 | 32 |36 |36 | 32 |25 |16 |9 |5 |2 |1 | 282 
Normal 1-2 | 20 | 4-7 | 9-3 | 16-1 | 24-2 | 31-9 | 36-6 | 36-6 | 31-9 | 24-2 | 161 9-3 | 4-7| 2-0 | 1-2 252 












































The true and approximate distributions are close and the normal distribution can obviously 
be used for estimating the distribution of S* in short sequences when the true distribution 
is not too asymmetrical. The approximation should not, however, be taken to absurd 
lengths. For example when r, = 2 and / = 1 no matter what r, the distribution of S* will 
be a right-angled triangle. It will be in fact, as follows: 


Distribution of S* for any sequence when r, = 2 





f(S*) 














=F) | ¥=s | F=8 (r—4) a 1 "C, 








and no normal approximation 1s possible. For r, > 3 even with / = 1 the distribution of S* 
is reasonably approximated to by the normal with increasing r. 

It is obvious that it will be rare in practical problems for / to be known. If 1 is not known 
then two possibilities suggest. themselves as tests for dispersion under the null hypothesis: 
we may dichotomize the sequence as near the median of the y observations as we can, or 
we may dichotomize the sequence near the median of all r observations. 


(1) Dichotomy close to the median y of the r, observations 
If r, is even we can divide at the median and take r,—/ = J. If r, is odd we may divide 
by taking 2/ = r,—1, 2(r,—1) = r.+1. Tables of the distribution of S*, under such dicho- 
tomies, have been calculated and are given in Table 3 (p. 179). For reasonable sized sequences 
S* may be assumed to be normally distributed with mean and variance: 


; r2(r+1) r,7o(r + 1) 3 
es Se ee ee eS 
(i) rgeven &(S*) = 4(r,+1)’ OR B f mara 
Pee 4 ee, 
4 ee 1o(T2+2)) 





(ii) r, odd &(S*) =| 


red (2) Dichotomy close to the median of all the r observations 


lis- When r is even the sequence can be divided by the median into two equal parts. When 
ion r is odd we have taken the line of dichotomy between the Rth and the (R+1)st ranked 
ies observation so that R = 4(r—1). The difference of the sum of the upper set of y observations 
and the sum of the lower set of y observations will be S*. Tables of the distribution of S* 











176 Non-randomness in a sequence of two alternatives. I 


under these conditions have already been given by us (Barton & David, 1958). For reason- 
able r, S* may be assumed to be normally distributed with mean and variance: 


(i) r even ést) = "+ 2)_4 og. =  (1- 3 ) 








+ 2’ 48 r2—] 





a . ro(r+1)? (r—1) r,7r(7r +1) 3 

(ii) rodd &(S*) = aa a ek o%, = i (145). 

Either of the two statistics which we propose will be useful as a quick test for difference 
of dispersion for two elements in a sequence, even if the basic conditions of the model are 
not satisfied. We illustrate Method I on data quoted by Freeman (1953) concerning the 
distribution of the disease of nettlehead in hop plants which have been planted in a rect- 
angular lattice design. According to Freeman some of the plants had died, but we assume 
that they are alive and healthy in order not to prejudice the issue. This assumption makes 
the worst conditions for the application of the test. The layout is given in Table 4 (p. 180), 
where N denotes a diseased plant and the vacancies are all free of disease. If the plants are 
attacked randomly by the disease then the healthy and diseased plants in any one array 
should together form a random sequence of two alternatives. If, on the other hand, the 
disease tends to spread from plant to plant then the dispersion in the sequence will be less 
than might be expected on the hypothesis of randomness. The analysis of each vertical 
array using Method I shows that the dispersion is significantly less than it should be in 
6 of the 11 arrays. The mean of the ratios 

S* —&(S*)—4 
ae a 

is — 1-5514 and this itself may be assumed to be normally distributed with variance 1/11. 
The corresponding unit normal deviate is — 5-1455 which implies that in the whole lattice 
design the dispersion is very much less than might be expected on the hypothesis of ran- 
domness of the disease. The analysis for 7', the number of groups, is also given. Mean 7 is 
equivalent to a unit normal deviate of —3 and so is significant. 

Many variants of the models which we have set up are possible, though most of them do 
not lead to sufficient statistics. The variant of the previous probability model which is 
possibly the most interesting is when we omit the restriction that the judge is able to place 
the | y’s of Class I with certainty below the m y’s of Class II and we suppose that he has a 
probability p of correctly ranking the two sorts of y. Apart from this relaxation we use the 
same model as before. If after comparisons the composition of the sequence is b, a,, bg, ...; 
where 6; and a; are the number of y’s and of z’s in the ith group of each, respectively, then 
the probability of such a sequence is proportional to 


bi 
$-* Ai times the coefficient of 2’ in [] [] (1 +2¢?4i+it4), 


i j=1 
where A,;= Sa, and 8B, = >5;. 
j<i j<i 


It seems clear that there is no sufficient statistic for ¢. If we expand the above expression 
in powers of 64 = ¢—1 we find that it is proportional to 


ae ed 





Now 
It is, 





D. E. Barton, F. N. Davin anp C. L. Mattows 177 


Now U* is equal to 3r,(r. + 1) plus twice the number of paired comparisons between x and y. 
It is, therefore, essentially Wilcoxon’s statistic. If the rank of the ith y is R; and 


then V= : > (t,-2)?. 





It will be noticed that when / = r, —/ the test statistic is V. 

This ¢ model which we have just set up would seem to be a little difficult to translate into 
general terms. The intrusion of Wilcoxon’s statistic suggests that the model might be used 
to represent a change in the parameters of location as well as those of dispersion and there 
is no question but that a change in the former would mask almost entirely a change in the 
latter. The V of the test function is very highly correlated with the sample variance of the 
r,y’8, a Statistic which has already been suggested (David, 1956) as a test for dispersion. But 
the situation is not entirely clear and has no obvious interpretation. 

In this present paper we have put forward a model for the alternate hypothesis to ran- 
domness in a sequence of two alternatives, which would seem appropriate, for example, in 
a situation where the order in the sequence results from a judge’s ranking. We have shown 
how two different variants of this model lead to statistics which are optimum to detect 
specified departures from randomness and have been able to suggest general interpretations 
of such departures. In the first case we have shown that Wilcoxon’s statistic is sufficient and 
we have compared it with two other criteria (Mood’s median and the number of runs) which 
have been suggested for testing the same null hypothesis. In further discussion of ranking 
models alternate to randomness we shall give models for which: (i) the number of runs is 
sufficient and compare it with Wilcoxon’s and Mood’s statistics; (ii) Mood’s statistic is 
sufficient and compare it with Wilcoxon’s statistic and the number of runs. 


REFERENCES 


Barton, D. E. & Davin, F. N. (1958). Ann. Hum. Genet. 22, 250. 

Davin, F. N. (1956). Biometrika, 43, 485. 

FREEMAN, G. H. (1953). Biometrika, 40, 287. 

Hatpang, J. B. 8. & Sirsa, C. A. B. (1948). Ann. Eugen., Lond., 14, 117. 

Mann, H. & Wurtney, D. R. (1947). Ann. Math. Statist. 18, 50. 

Matiows, C. L. (1957). Biometrika, 44, 114. 

Moon, A. M. (1950). Introduction to the Theory of Statistics. New York: McGraw-Hill. 
Watp, A. & Wotrowrrz, J. (1940). Ann. Math. Statist. 11, 147. 


12 Biom. 45 





178 Non-randomness in a sequence of two alternatives. I 


Table 1. Power distributions for S,b and T under the ¢ alternative 








tr | tm | ¢ | 10 0-9 0:8 0-7 0-6 0-5 0-4 0-3 0-2 

5 | 5 | S| 0-05 | 0-075 | 0-160 | 0-310 | 0-506 | 0-703 | 0-855 | 0-945 | 0-987 
6 | 4 0-05 | 0-074 | 0-153 | 0-295 | 0-481 | 0-673 | 0-830 | 0-931 | 0-981 
7}? 0:05 | 0-070 | 0-135 | 0-251 | 0-407 | 0-578 | 0-737 | 0-862 | 0-945 


5 5 b 0-05 0-064 0-111 0-192 0-301 0-432 0-561 —_ — 





5 5 | T | 0-05 0-059 0-091 0-151 0-243 0-364 0-502 0-645 0-799 
6 4 0-05 0-058 0-087 0-143 0-229 0-346 0-482 0-626 0-765 | 
7 3 0-05 0-059 0-090 0-149 0-239 0-355 0-489 0-628 0-765 | 









































(Irregularities in the distributions are due to the device which was used to make them comparable, 
i.e. the assumption of continuity.) 


Table 2. Power distribution for the rank difference test for dispersion under the ¢ alternative 


if | 
r | 9% ere 1-0 0-9 0-8 0-7 0-6 0-5 | 0-4 | 
a 


5 5 5% 0-05 0-086 0-144 0-235 0-366 0-533 0-769 
24%+24% 0-05 0-059 0-089 0-148 0-249 0-394 0-572 























| 
| 
| 

















Table 3. Distribution of S* = S,—S, 





“(T+az 10 yz) sues eqsty eq} Jo UNS 04} St *y f(y) SYUeI Jamo] aq} JO WINS 24} ST ty {84 =({+y)+y st Auroyoyorp aq} ppo st ®u yr S84 = a +a st Atuoyoyorp oy} wade stu JT *ajo47 

















































































































































































































































































































| | | | 9 
soe z|¢1|6 | 9t| sz| 6¢/ 9¢ | 08 | Gor | LHI | Z6r | 96% | LLe | OTe | FRE | OPE | OFE | FEE | OTE | LZ | EZ get | 29 601 | 08 | 9¢ | 6g | ez | OT Sal I | ¢| itl 
9008 - |= |digses6 |o9r! gz} 6¢ | 9¢ | 08 | Gor | LHI | Ost | FIZ | 9Ea | FEz | 19a | Fz | 96% | FIZ | OST | LHI | Gor | os | 9¢| Be | sz} ot) 6] ¢)|2]1|¢)} or” 
2008 iy dial Fis as 1 |e |¢|6 | or | oz |e | 9¢ | oR | GOT | OT | OMT | LAT | BT | OBT | LAT | O9T | get | GOT | O8 | 99/ Ge | ssj oT} 6| sz t|¢sl6 
1921 ae | Ine Aa ae | ae T |8 | |6 | 9 | $8 | 6e | 99 | 08 | 66 | BIT | Ler | eer | Ler | BIT | 66 | O8| 92) Ge) es) or] 6) s/s) 1) s|8 
6L Se | EG a a Hi : : - tt Js ig |6 |9r | oe | 6s | 99 | tm | 68 | 68 | $8 | 9¢|6e|sej}or] 6) ¢|e|rt|s| 2 
Zor | ne faa < : odo " {tT |8 |¢ |6 | 9t | S¢ | 68 | SF | LG | So | Ig] S| GE) se) or} 6) S| sa)t|s] 9 
pa a As | ae ee eb ds 6 well Facil ‘al Scalia acl Wee Val (dl Pal ius 4: 

ee De em | SRF FRE NE ert PM | AP Ch Pe ‘f 7 
* | | Ee ae ae me 
ts | | | | | | | | 
a Th | oF | ee | 88 | 16) 96 | SE) HE | ee | BE | Te | OF | 6s | BB | Le | 9B | | FG | 8 | B 1B | 06 | Gr | et | Lt ot | St | FL | St | at | IT *) e | *S 
| | | | | | 
ee UE Cog. a ee Rk a a! ae es = oy RE) ear eh ae 
| | | ea 
ozet| t | 2 | ¢ | 8 | #1 | 02] Og | OF | So] OL | T6 | SIT | OFT | FET | GOT | OOT | Sst | OFT | 9zt | FOT | 98 | ZO] SFI FZ! ET] F | Srl 
eget | * | *° | tT | 8 |e | 8 | Tj) 02] 08] OF] So] OL | T6 | SIT | Let | OET ost | o3t OIL | 26 | 22 | 99] | 2] at] + | IT 
tor}; *|°|°|°}]t 18 |¢ | 8 | 1] O8| o8| OF | so | OL | Te | 00 cot | oor | #6 | og | 89] 0S | 46) 02] IT] F | OF 
geresies ph. et fume yea te ly | 8 $ | 8 | FT) 0s | Os | OF | Go | OL | OB | 08 gL | 99 | 6o| | se] st] Or] F | 6 
cor | ° "Tod c]c}t]t{t }e je ]8 | er | os | oe | oF | es | 09 | eo | 99 | os} se) 6s) or} 6 | +] 8 
oge . r - | k z IT z ¢ 8 FI | 0@ 0¢ be OF tPF I? | 2 | S| FT) 8 P L 
OIZ , . oo ee “| * |tT |8 |¢ |S | ot | 08 | os | ee | ee} os; Iz} ar}. |r| 9 
g2t | ° : z rh 3a “|t |8 |¢ |s8 | | os | e]os| At; or}/9 | +] ¢ 
OL | x ‘ | r ’ ‘ . I 6 |¢ 8 | hr | HT} &t) 8 ¢ v P 
| mee | | | | 
> Sad =< = 4 ae Oe ee ce | 
1 | | | 
rere,| 8% eg &% | FB | S| S| 1S | 0G | GI | Bt} LT | OT | ST wm} et | er | | or) ej/Lz}/9]}s| Fr *S 
oe: Fae SR | ee a Bo) 
| ) : 
o9¢ | I | @ r|9 6 | et | 9t | 02 | sz | og | 9¢ | oF | oF | 99 | oF | oF | 98 | OF | sz] oz/or}/ar;/6}/9;F}se}t| ¢] et 
cP -|t!1e1%]9 16 | et] 9t| 02 | es] og | 98 | a] oF | o | 9¢]0¢| ¢sz}os| ot} ar}6|/9}%} s/t] ¢ | at 
P98 -!|*]riesr |9 16 | e| ot | 02] se] os] 98] o& | 96 | 0€ | se} oe] or] et|)é|/9}F)/ a] | ¢ | it 
pee fe ak Ls “1 *)r 18 | 19 | 6 | Bt] OF | OB | sz | OG | 96 | OF | os] Os} OL] Bt) 6} 9/F)s]T| € | OF 
oz | °|° : -|}*Iltief*e |9 16 | | 9t| oz] sz} oe) ez}oe|orjar)e6|/9lF\/ei]1t| ¢)| 6 
COT ake se ; -}°]°*)*{[tfies¢ |9 |6@ |e] oer]oe|sz}oe| or) al/é6|/9}r|/eit}e]s 
OZI ol ie -|° it 18 |¥ 19 1-6 | St) OE) Oe] OL] Stl élolFr| Sit) ei 2 
eh aa alae feed BP ta sir ine al (ene aia) tee is Ue Sid a ASS cs cel joey ey decal ya oT ce Us a 
9¢ Sal eth ae | = Pe tf ee eae ep Te ce aes ee: ree ne Oe eae Eee g 
gs | ° ad PRS : por PR ‘ Se PE ae eS (Oa Os) Sees Tet oo oe 
os Waddell ae iol Gk eck elt ale ‘ -1* f= leds lelelelelriele 
Sy y 
ia bie | = | mea Ma cae ee ee ee Cee ee 
cs 
rei, | 08 | 62 9z | 12 | 9% | $2 | FZ | et | BZ) 12 | Of | GT} ST | LAT | OF mal te ‘ai load dad ing 6/si2zi9lelr xf 
I = | | = | | | | | a » | 
ty "9 = «9 fo uounguuysiq *¢ e148], 
| ~w at aI i> 4 3 | | Po ~ ee 








180 


Non-randomness in a sequence of two alternatives. I 


Table 4. Nettlehead plants in a lattice design 





Mean S* 


o g* 
(S*—&(S*) +P) org 


Mean T 
or 





Row number 
























































rz 
| (2-8(D)+ Dlr 





| | 
39 | 40 | a | 42 a 2 46 at | 48 | 
| | | | 8 
| | | | | 
; ' | BS) Fi-y N ' ae 
N ‘ ; wy | W N N ; 
N N | N N “a 6 oe ee N N on 
N N ; oi ae Obie ob N e if. hoo8 
N N | N a fae 4 DP - t- Fa 
| 
A ke ee ae ee | vy | WwW ria 
N a a w-4 a ; 
N 4S | v ia ao N N N 
N het Bee seme ; N | W N 
Paro PER Sh 8 45 el | N N y 
| | | 
N ie ee ee ae N N . ; y 
ie Wild Ns | WwW N N N y 
N i 2 Tr | | N y 
N a. SS ee Ny | , 
| | N 
| | | | 
| N N y 
| . | . . Ny 
. 2 a ay N 
nv | | 
| | | | E 
| | | | , N ; 
| | N N N y 
j ; N Ny 
| N N ‘ ‘ Ny 
N N Ny 
| | | 
| N N y 
= N Ny 
vy | N 
N 
| | 
| 
30 30 30 30 30 30 30 30 30 30 30 
19 20 | 2 | 19 23 16 23 20 18 15 15 
= Sea a a 7 10 12 15 15 
2 Ee eee. 5 w/e ase 3 5 6 vf 7 
6 | aE. 6 4 7 4 5 6 8 8 
27 | «620 31 27 12 28 12 25 34 4 61 
& | 50 79 85 67 100 38 80 128 184 180 
59 | 30 48 58 53 72 26 55 94 140 119 
93 | r0-45 62 101-26 85-8461 | 124 124 
11-9781 | 11-5052 : : 10-9011 | 12-1076 ‘ / 11-9154 | 12-2654 
—2-7968 | —3-4727 | —1-9082 | —2-8803 | —0-7797 | -2-3759 | —3-2566 | —1-2998 | +0-7260 | +1-3452 | —0-3600 
| | 
1493 | 14:33 11-73 15-93 15-4 16 
2-493 | 2-381 ; 1-991 2-679 ‘ 2-579 ; 2-691 
13 | 7 13 13 15 8 9 10 13 16 | 12 
-—0575 | -2870 | -0-350 | -0575 | +1892 | -2-775 | -1-122 | -1-610 | —0-737 | +0-185 hee: 

















[ 181 ] 


SIMPLIFIED RUNS TESTS AND LIKELIHOOD RATIO 
TESTS FOR MARKOFF CHAINSTt 


By LEO A. GOODMAN 
University of Chicago 


1. This paper will first discuss the ‘group’ or ‘run’ test for randomness in a single 
sequence of alternatives. This test was presented by David (1947) for the case where there 
are two kinds of alternatives. The case where there is an arbitrary fixed number, s > 2, of 
kinds of alternatives will also be considered, where the single sequence of alternatives 
consists of a long chain of observations. A simple derivation of some long sequence group 
(or run) tests will be presented by making use of a result due to Bartlett (1951) concerning 
the asymptotic distribution of the observed transition numbers in a probability chain. 
This derivation indicates some close relationships between standard asymptotic results for 
multinomial trials and certain results in the large sample distribution theory of runs (see, 
for example, Mood, 1940), so that a discussion of it may further the understanding of this 
distribution theory. The simplified group tests, developed herein for testing hypotheses 
concerning a probability chain consisting of s states, are also intended to illuminate some 
results presented by David (1947), Moore (1953) and Barton & David (1957) on group tests 
of randomness. 

The general approach presented herein indicates that these group tests will be appro- 
priate for testing the null hypothesis of randomness, or certain specific generalizations of 
thisnull hypothesis, against certain specific kinds of alternate hypotheses concerning Markoff 
chains. This general approach also leads to some simplified tests, which are similar to stan- 
dard tests of independence in contingency tables, for hypotheses that have been considered 
by Hoel (1954) and Good (1955). These simplified ‘contingency table’ tests for Markoff 
chains, which are related to, although different from, the likelihood ratio tests given by 
Good and Hoel, are tests of certain specific generalizations of the null hypothesis of random- 
ness, and are in general different from the group tests. Good, in the errata to his article, has 
referred the reader to the results presented in this paper and has agreed to the correction of 
a number of inaccuracies which will be pointed out below. 

In a previous paper by Anderson & Goodman (1957), there was a discussion of hypotheses 
and tests concerning Markoff chains which are quite different from the main ones described 
herein (e.g. the hypotheses studied in that paper did not lead directly to the development 
and use of group tests). This earlier paper was concerned mainly with the case where there 
are a large number of observed sequences from a Markoff chain of fixed (perhaps even short) 
length, while there was one brief section (§ 5) in it dealing with the case of a single observed 
sequence consisting of a long chain of observations. 


2. David (1947) has suggested that as a test of randomness in a sequence of alternatives, 
where either the event EZ or £ will occur in a single trial (i.e. the case of two alternatives), 
the total number & of observed groups of Z and # that appear in the sequence should be 


+ Research carried out at the Statistical Research Center, University of Chicago, under sponsorship 
of the Statistics Branch, Office of Naval Research. The author is indebted to T. W. Anderson and Ingram 
Olkin for some very helpful comments. 








182. Simplified runs tests and likelihood ratio tests for Markoff chains 


compared with the conditional distribution of k, under the null hypothesis of randomness, 
when the number 1, of observed H’s and the number n, of observed E£’s in the sequence are 
given. It can be seen that the observed k is approximately twice the number n,, of times 
that an £ is followed by an £ in the sequence. More precisely, the observed transition 
number 7, is such that | ,.—4k| <1. Thus, when the sequence consists of a long chain of 
observations, the test suggested by David may be closely approximated by comparing 
the observed transition number 7, with its conditional distribution under the null hypo- 
thesis. Since the transition numbers, and the initial state, form a set of sufficient statistics 
for the parameters in the transition probability matrix of a Markoff chain with a constant 
matrix (see Anderson & Goodman, 1957), we shall discuss tests, which approximate the 
group tests, but are based direci{* on the transition numbers. The distribution of the 
transition probabilities has been studied by Bartlett (1951) and Whittle (1955), and will 
now be discussed here. 


3. Consider a Markoff chain with transition probability matrix P = (p,;); i.e. p,; is the 
probability that the variate takes the value j at time t, conditional on the value having been 
4 at the previous time ¢— 1, where p;; is a constant independent of ¢. We shall assume that 
the number of states is finite, and shall number the states 1, 2,...,8. David (1947) studied 
the case wheres = 2, and the chain was assumed to be stationary. Suppose now that we have 
n consecutive observations from the chain, and that the number of observed direct transi- 
tions from 7 to j is n,; (t, 7 = 1,2,...,8). Bartlett (1951) has shown that, if P has no eigen- 
values on the unit circle except the simple value A = 1, so that the chain is ergodic and 
irreducible, then the n,; are asymptotically normally distributed with expected values 
m,;~ nP; p;;, where the asymptotic occupation probabilities of the chain are denoted by P,. 
Whittle (1955) has shown that the variates y,; = (n,;—,;)/,/n have asymptotically the 
frequency function in ue 

P(y) = const. exp E pp p> Ytin( "in ? (1) 
tjm +4 Pij 
where 6;; has the value 1 or 0 according as i and j are equal or unequal,  y;; = Ly 
I I 


(¢ = 1,2,...,8), UUYis = 0, and y = (Yi). 


Let x,; = (145 — 1%, Piz) (Pi), where n;_ = p> nj; Then x; = (Yas — Pig ~ Yul VP;, p> ui; = 9, 
and . 


2p» (x7;/p4;) = x = x eshon(1 —“in) ’ (2) 
ij i m a 


ij 
Since the Jacobian of a linear transformation is constant, it can then be seen that the 
variates x,;, which are a linear combination of asymptotically normal variates, have 
asymptotically the frequency function 


P'(x) = const. exp[—}¥ ~ (x7;/Pis)), (3) 
where >) 2,;= 0, and x = (x,;). Let 2; = (245 — 5, Piz)/VN4,- Since n, /(nP;) converges in 
g “ 


probability to 1, the variates x,;;—z2,; = x;,{1—,/(nP,/n,_)} converge in probability to 0, 
and the variates z;; have asymptotically the frequency function P’(z), which is the same as 
the function obtained for the ~;,;. 

The asymptotic frequency function for the z,; is the same as that obtained when a fixed 
sample of size n; is drawn from a multinomial population of classes j = 1, 2,...,3 with 








Lzo A. GoopMAN 183 


associated probabilities p,;, where n;; is the number of elements falling in the jth class from 
the ith sample, and s independent samples (i = 1, 2,...,8) are drawn (see, for example, 
Wilks, 1944, p. 217). The maximum likelihood estimates of p,; are p,;; = n,;/n;,, and 
243 = (Piz — Piz) 1%. The large sample variances and covariances of the z,;, or of the 
Wiz = 245 (nP,/n;.) = J(nP,) (Pi; -—Pi;), ave obtained simply from the standard asymptotic 
results for multinominal variates; i.e. 0}; ;; = pi(1—p.;), Ci, = — Pi Pu for J +t, Ci, = 0 
for 1+ k (see Bartlett, 1951, p. 93). Thus, the large sample variances and covariances of the 
p,; are obtained as standard multinomial formulae where, however, the n, are replaced by 
their asymptotic expected values nP;. We have the additional result that the p,;, when 
properly normed, are asymptotically normally distributed, and the frequency function is 
determined by P’(z), where the constant in the formula can be evaluated directly from the 
standard asymptotic results for multinomial variates. 

Since | n;—n;.| <1, where n, is the number of observations in state i, it is clear that the 
asymptotic statements presented in this section also hold true when n,_ is replaced by 7,. 
For the sake of simplicity, we shall deal with n,, rather than »; , in much of what follows 
herein. 


4, Let us now reconsider the case when s = 2. The null hypothesis of randomness (i.e. 
independence of successive observations) states that p,. = Po, and a long-sequence test 
of this hypothesis can be obtained by computing v = (P12 — Po9)/4/(PoJe[ni + nz"]), where 
Do = (My Pro + Ne Pog)/N ~ (Ny2 + Ngq)/n = N.>/n~N./n and Gg, = 1—,, which is a standard 
procedure for comparing observed proportions from two large independent samples. From 
the asymptotic distribution results presented in the preceding section, we have that 
0~ (Pi2— Doo) (M = Wy2/VP,— W/V P2+(Pi2— Poe) /n, and the asymptotic mean of v is 
(Pi2— Poo) /n, and the variance is Pj. 7;/P,+Po2Po:/Py. If the null hypothesis that 
Piz = Poo = Po is true, then p,; = P;, and the mean of v is 0, the variance is 


P2(1 — po)/P, + poll — Pe)/P, = pe+(1—pe) = 1, 


and the asymptotic distribution of v is the unit normal. For any alternate hypothesis 
P12 Pox, the asymptotic mean of v approaches infinity as ,/n—>0o, while the variance 
remains a constant that depends on the values of pj, and pop. 

We also have that 


V~ (Myo — Ny No9/Nq) V(n)/n, ~ [Nyg — 04(Mz — Np1)/Np] V(n)/n,. (4) 
Since | ny.— | <1, 
V~ [Myo(1 + 2y/Mg) — My] (nm) /My = [MyQn/Ny— Ny] /(N)/Ny = [Nyg— 1 NQ/N] nt /(nN2). 


Thus, a test of whether v differs significantly from 0, which is a natural long-sequence test 
of the null hypothesis of randomness (i.e. P12 = P29), does in fact test whether the observed 
No differs significantly from n,n./n, or whether k differs significantly from 2n,n./n, which 
is directly related to the test suggested by David (1947). 

Both David (1947) and Moore (1953) assume that the chain is stationary. Since we have 
assumed that the observed chain is long and that it is ergodic and a stationary distribution 
is approached, it is not necessary to assume here that the chain is stationary. E. L. Lehmann, 
in a communication to the present author, mentioned that, for a stationary chain of Z’s 
and #’s (i.e. s = 2), the number of runs of £’s, the number of runs of #’s, the number of Z’s, 
and the number of #’s, form a sufficient set of statistics. He also proved that the one-sided 








184 Simplified runs tests and likelihood ratio tests for Markoff chains 


group test suggested by David for the null hypothesis of randomness and the alternate 
hypothesis of positive dependence is essentially uniformly most powerful unbiased against 
this alternate hypothesis. These results are closely related to the fact mentioned earlier 
herein that the set of transition numbers (and the initial state) form a set of sufficient 
statistics. When s = 2, there are four transition numbers; but since | n,.— 1, | < 1, the set 
N15 N19, Nog, OF the set N45, 21, Nz, for long chains, will be approximately a set of sufficient 
statistics. Also, since n,+, is fixed = n, the statistics n,. and n, will approximate a 
sufficient set. Since, |,.—4k|<1, the number & of runs and the number n, of E’s also 
approximate a sufficient set, and tests can be based either on the statistics n,. and n, or on 
k and n,. The one-sided test of whether v differs significantly from 0 is based on m4, and n,, 
and it is an approximation, for long chains, to the uniformly most powerful unbiased 
test. 

The asymptotic distribution presented here for the variate v ~[n,.—n,n,/n] n3/(n,n,.), 
when the null hypothesis of randomness is assumed, is similar to, although not identical 
with, an asymptotic distribution in the theory runs (see Mood, 1940, p. 381). The result 
presented by Mood was obtained for the case where randomness is assumed and the ratios 
n,;/n remain fixed as n> 00, while in our case the n; are random variables. He also deals with 
the case where the n; are random variables, but in that case he studies the distribution, under 
thenull hypothesis, of (n,.—p, P2)//{n(P; P2— 3pip2)}, Where py2 = Pog = pzand p, = 1—p, 
(Mood, 1940, p. 392). It is possible to derive the asymptotic distribution under the null 
hypothesis presented herein from Mood’s results, and some of Mood’s results can be derived 
from the approach given here. The proof given here followed directly from the asymptotic 
distribution of v, which is the normed difference between two observed proportions. The 
results presented herein lead directly to the asymptotic distributions when both null and 
alternate hypotheses are considered, while the distribution theory of runs in Mood (1940) 
deals only with the null hypothesis. The derivation given here indicates the close similarity 
between some asymptotic results in the distribution theory of runs and the standard multi- 
nomial distribution theory. This point will be discussed further in § 6. 


5. David (1947) discusses the power function of the group test where there are n, ob- 
served E’s and n, observed £’s. The null hypothesis considered is that p,. = Yo.9; thus, 
Pu = Pox = P,. The alternate hypothesis is that p,,>P,, where P, = P,p,,+P po, or 
P, = Po /(P12+ Po). Since the alternate hypothesis is that p,, > P.2;/(P12+ Po), We have that 
Pir Pig t+ Por Pu > Po OF that py, > Po; 1.€. Py2< Poo. Thus, a one-sided test of whether 
V~ (Dis — Poo) 2 = (Por — Pr) /N~ (My2— 4, Nq/n) N*/(n, Ny) differs significantly from 0 is 
applicable; the rejection region of approximate size « is v< ¢,, where €, is the quantile of 
order « for the unit normal distribution. Hence, the null hypothesis is rejected when 1,9, 
or 2k, is significantly small. The power function of the test may be approximated, in a very 
rough manner, by the use of the fact that v—(p,.— 49) ,/n is asymptotically normally 
distributed with mean 0 and variance 749 71,/P, + Pos Poy/Po. 

David (1947) presents sections of the exact conditional power surface for P, = 0-5, 0-6 
and 0-75. She states that ‘it is clear that the test is most powerful when the number of alter- 
nates areequal;i.e.whenn, = n,. The powerdeclinessharply when n, increases at the expense 
of n,’. We shall discuss briefly some aspects of the more general case where P, > 0-5, and the 
observed sequence is large, which may further illuminate the preceding statement. Since 
P, Pyg = Popo, then Po/Py. = P,/P,2 1. When the alternate hypothesis p,, = P, (P12 S Pos) 
is true, and it is also assumed that p., > p42, then it can be shown, simply by considering 





—_—" Pp —_—~ > 2dr 





Lro A. GooDMAN 185 


the curve p(1—p) as a function of p, that r = oo Po1/( P12 P11) > 1. Also, r = 1 only if P, = 0-5 
or if the null hypothesis is true. 

Let us now consider the case where a sample of size n, is drawn from a binomial population 
with parameter p,., and another independent sample of size n, is drawn from a binomial 
population with parameter p,.. The allocation of sample sizes n, and ny, where n = 2, +N 
is fixed, which minimizes the variance of v = (P12 — P22) /n is determined by the solution x, 
of the equation x(r—1)+2x—1=0, where r = Po Po3/(Py2 P31) and n,/n = 2%. When 


.r = 1, then 2 = 4 and 2, decreases as the value of r increases. Thus, when r > 1, then x, < }. 


Also, when r > 1 in the case of two independent samples of sizes n, and ng, it is clear that the 
variance of v is smaller when n,/n = } than when n,/n > 4; but that the variance is smaller 
when n,/n = xy<} than when n,/n = 4. Thus, the variance of v is minimized when n, = ng 
in the case when r = 1 (i.e. when P, = 0-5 or when the null hypothesis is true), and only in 
that case. Also, when the null hypothesis is that p,. = p22 and the alternate hypothesis is 
that p15 < Peo, and it is also assumed that p,.< pz, (or P,> 0-5), then the variance of v is 
smaller when n, = n, than when n,>,; however, for each alternate hypothesis where 
Py > P, > 0-5, it is possible to determine values of n, and n, so that n, <n, and the variance 
of v is smaller for these values than for n, = ng. 

It was shown earlier that there is a close similarity between the asymptotic distribution 
of certain statistics (for example, v) that can be computed from an observed sequence from 
a Markoff chain and related statistics computed from data obtained from independent 
samples from multinomial populations. The preceding paragraph, which discussed the case 
of two independent samples from binomial populations, was intended as a suggestive intro- 
duction to certain exact results presented by David (1947). This discussion suggests that 
the test of randomness under discussion, when the observed sequence is large, will be most 
powerful when n, = ns, if the alternate hypothesis is limited to P, = 0-5. Also, n, = nz 
will be preferable to n, >, when P, is assumed to be not less than 0-5, and n, = ng will be 
preferable to n, <7, when P, < 0-5. This discussion is only suggestive and can not serve as 
a proof, since (a) the exact conditional power surfaces, given n, and ng, are of interest (but 
we have not considered herein the conditional distribution of v, given n, and n,, but rather 
the asymptotic unconditional distribution, which however is closely related to the asymp- 
totic conditional distribution), (b) the large sequence results presented herein for certain 
statistics from a Markoff chain depend on the fact that n,;/n converges in probability to P,, 
and (c) we did not discuss directly the power of tests but rather the variance of estimates. 


6. Moore (1953) presents a procedure to test the null hypothesis, H), ‘that there is 
randomness within the sequence’, and he implicitly suggests that this is the same null 
hypothesis considered by David (1947). In applying the procedure suggested by Moore 
(1953), we see that the null hypothesis that he actually considers is that p,, = py; = p,, 
where the common value p, must be specified under the null hypothesis as well as the values 
of p,, and p,, under the alternate hypothesis. This hypothesis thus differs from the hypo- 
thesis of randomness (i.e. independence of successive observations) considered by David 
(1947), which did not require the specification of p, under the null hypothesis nor the 
specification of p,, and p,, under the alternate hypothesis. Moore (1953) states that the 
likelihood ratio in this case is 





P, | 
MH ne} + a pat 
L= Pr {t| 1, %,Hy} _ ppl Piz (Px al (5) 
Pr {t | n1, No, Ho} 2pi ps" 22 Pir)” 








186 Simplified runs tests and likelihood ratio tests for Markoff chains 


where 2¢ is the number of groups observed. Moore focused attention ‘on the case of an even 
number of groups, bearing in mind that the results, at any rate for a large number of groups, 
apply equally well to the case of an odd number of groups’. 
It can be seen that the probability of obtaining a given observed sequence from a Markoff 
chain, when the initial state is fixed (i.e. not random), is simply II I p;,;. Thus, the likeli- 
7 


hood ratio, when the initial state is fixed, is L* = pip pi" pip p3y/( pit" pget"=), which is 
of the same form as the likelihood ratio obtained when the observations consist of two 
independent samples of sizes ny, = 4, + M2 and Ny, = Ng, + Mg from binomial populations 
with parameters p,, and p.,, respectively; the null hypothesisis that p,, = ps, = p, (specified), 
and the alternate hypothesis is that p,, + p., and these values are a!so specified. The like- 
lihood ratio L**, which is obtained when the initial state is not fixed, is L*P,/p, if the 
observed initial state is H, or L*P,/p, if it is 2, in the case where it is assumed, with Moore 
(1953), that the Markoff chain is stationary; i.e. P, = P, p,, +P, po, is the probability that 
the initial state is #. Again, the close similarity between the methods of inference about 
Markoff chains and inference based on independent samples from multinomial, or binomial, 
populations is apparent. 

The ratio L presented by Moore (1953) differs somewhat from L* and L**. We shall now 
examine this difference. The joint probability distribution of the observed number 1, of 
E’s, the number n, of £’s, and the number k of groups, in a sequence of a given number 
n = N, +N, of observations, is 


f t 
Past) A) a oe 
ae —i+—3 1s 

( Py) \t-1 t—-1]\po. * Dis PriP22 


for k= 222; 


Pr {k, 1, M2 | Hy} = (Pate) Ee on ek: ' P, on : i. ‘)| iat ae 
te +2 . 
P22 Pi) |\Pu\ t t—1) * py.\ t-1 t Pri P2d 


for k= 2t+123; 








py pss, for k=0 and min[n,,n.] = 0; 


4 


where ¢< min[n,,7.] and We i) = 0; this fact can be seen to follow from an argument 


similar to that appearing in David (1947), where there is a derivation of the conditional 

probability, Pr {k | n,,.,H,}, of k given n, and n,. Moore (1953) uses the symbol for the 

conditional probability, and implies that he is concerned with the conditional probability 

of k given n, and ns, but the formulae that he presents seem to be actually related to the 

unconditional joint probability Pr {k,n,,.|H,}, rather than to the conditional probability. 
In the special case where p;; = p; = P;, the formulae for Pr {k, n,, n, | H,} reduce to 


f — _— 
2(7 4 ea ‘ OEP OF BES 


Pr {k,n,M-| Ho} = 4 5 ' pay + ey my ‘)| pepe for k= 22t+123, 


pips for k=0 and min[n,,n,] = 0. 








We have that 


CIE) CIOS) CON CIE 





ren 


Ps, 


off 
eli- 
his 
wo 
ons 
-d), 
ke- 
the 
ore 
hat 
out 
‘ial, 


10W 
, of 
ber 


nent 


onal 
the 
ility 
» the 
lity. 


» (7) 





Lro A. GooDMAN 187 


Writing 2 min [n,,”.]+1 = K, the joint distribution of n, and n, is 


Pr {n,, n2| Hy} = > Pr {k, n,n, | Ho} = e pe py for min[n,,n]+0. (8) 
Thus, we have that Pr {k | ny, 2, Hy} = fy, / (") for k=2, (9) 
and Pr {0| ,,%,Ho}=1 for min[n,,n] = 0, (10) 
where Sot = (7) fet > Sota = Soy(m — 2t)/(2t). 


This is a simplification of the formula for Pr {k | n,, 2, Hy} given by David (1947, p. 334), 


since the term > f,, which appears in David’s formula, is not evaluated explicitly there, 
allt 


ne (see also, Wilks, 1944, p. 203). 
1 


The ratio L computed by Moore (1953) is, in fact, L = Pr {k,n,,n,| H,}/Pr {k, n,, ng | Ap}, 
when k = 2t, rather than the ratio of the conditional probabilities Pr {k | n,n, H,} and 
Pr {k | n1, %2, Ho} as was suggested in his paper. By some direct calculation, it can be seen 
that L = L** in the case where k = 2t; i.e. the case considered by Moore (1953). However, 
when k = 2¢+1, the value of L will, in general, differ from L**, except when the alternate 
hypothesis includes the assumption that P, = 0-5. The difference between Z and L** is 
due to the fact that L** is based on the probability of obtaining a given sequence of observa- 
tions, while LZ is based on the probability of obtaining a given set of observed values for 
k, ny, and Np. 

Moore (1953) also considers the particular case where, under the null hypothesis, 
Pu = Pa = P, = 0-5 (i.e. ‘perfect’ randomness), and under the alternate hypothesis 
P, = 0-5 and p,, is equal to some particular value other than 0-5. Thus, under both the null 
and alternate hypotheses p,.5 = po, and p,; = Pg. The null hypothesis states that 
Py = Poo = 0-5 (thus, py. = Po, = 0-5), and the alternate hypothesis that states p,, = Poo 
is equai to some particular value other than 0-5. 

When 7; = Po, the likelihood function of the observed sequence, when the initial state 
is fixed, is simply pip +" piy**”™, which is of the same form as the likelihood function ob- 
tained when a sample of size n, +”,. = n—1 is drawn from a binomial population. Since 
P, and p, both equal 0-5, we have that 


while we find that it is equal to ( 


L** = L* =L= nl ptr t Na tye t mn = 2n—l ph 1—mis— tn yet na, (11) 
Since k = n4.+%,+ 1, we have that 
L = 2" pi * pis, (12) 
which can be applied for both even and odd values of k. We see that L is of the same form as 
obtained in testing that the probability of a ‘success’, in a sample of n— 1 binomial trials, 
is 0-5 under H,, and p,, (specified) under H,, where k — 1 ‘successes’ are observed. Thus, the 
procedure suggested by Moore can be generalized so that it is applicable for both even and 
odd k values, and this procedure can be reinterpreted so that it is of the same form as the 
related standard method for tests based on binomial trials. 


Both the null and alternate hypotheses considered by Moore (1953) were simple hypo- 
theses, while the hypotheses considered by David (1947) were composite. Let us now modify 








188 Simplified runs tests and likelihood ratio tests for Markoff chains 


the particular case, which was considered by Moore (1953), so that the alternate hypothesis 
will be composite. Consider the null hypothesis H, that p,, = py, = p, = 0-5 and the 
alternate hypothesis H, that P, = 0-5 and p,, is unspecified. Thus, p,, = p22 = 0-5 under H,, 
and p;; = Pog is unspecified under H,. Using the results presented in the preceding sections, 
we see that a large sample test of this hypothesis can be obtained by computing 


U = [(y Py, + Mp Poq)/n — 0-5]/4n-4, 


which is a standard procedure for comparing an observed proportion with 0-5. Since 
Pir = Po, the estimates p,, and P,, are pooled, and the pooled estimate (n, ),, + Nz Po)/n is 
compared with 0-5. From the asymptotic distribution results presented in § 3, we have 
that w~ (4/(m1) 24, + /(Me) Zag — 4N + Ny P11 + Ne Poo) 2/,/n and the asymptotic mean of wu is 
/n (2p,,—1) and the variance is 4(P, 11; Py. + Pe Poo Po,). Under the null hypothesis, the mean 
is 0, the variance is 1, and the asymptotic distribution of u is the unit normal. 

Since n, Py, + Ne Poo ~ 041 + Nog ~ N — (Ng + Nq1) ~n—k, where k is the observed number of 
groups, we see that wu~((n—k)/n—4)2,/n = (4n—k) 2/,/n. The statistics n, and ng do not 
enter into this expression for w; this is due to the fact that k (or, more precisely, k and the 
initial state) is a sufficient statistic when it is assumed that p,. = p,,. This fact can be seen 
to hold true by the following approach. Whenit is not assumed that p,. = ,, then the set of 
sufficient statistics, when v is fixed, is 1, 212, %21, M2, and the initial state. As was men- 
tioned earlier herein, this set of sufficient statistics, can be approximated, for long chains, 
by the statistics k and n,. If p,. = pg, then the set of sufficient statistics, when n is fixed, 
IS 24, + Nog, Ny + Ng, and the initial state. Since (,; + Mog) + (My_2+Nq,) = n—1, this set of 
sufficient statistics can be approximated, for long chains, by the statistic n,.+,,, which is 
approximately equal to k. 

We have seen that the hypotheses and tests discussed in this section are quite different 
from those considered by David (1947). However, when the sequence of observations is 
long, all of these tests can be approximated by group tests; i.e. tests based on the statistics 
k, n,, and n,. In the special case of s = 2, the set of statistics k, n,, n, will approximate a 
sufficient set, when there is a long-observed sequence, and thus all reasonable tests for 
hypotheses considered herein will be group tests. This will no longer be true when s > 2. 
Simplified groups tests for the more general case where s = 2 will now be presented. 


7. Let us assume that, for a fixed j, p;; = p; (unspecified) for all i +7. The null hypothesis 
to be tested is that p,;; = p;. A large sample test of this hypothesis can be obtained by 


computing + : 
- D;;|(n—n,) — Py, 641—__+— 13 
= [EmPulln—n)— Py] [o,0(5— +5): (13) 
where p; = } 1; );;/n~n,/n and ¢; = 1— p;, which is a standard procedure for comparing 
i 
observed proportions from two large independent samples. From the asymptotic dis- 
tribution results presented in §3, we have that 
uy~ [2 1; Pij|(n—N;) — Pjj] (n~ [> A (;) 245 — Zy(% — 05) /4/05 + Py 1; Dig — P5j(n—N5)] 
1 t+) 4 
xn|(n—n;), (14) 
the asymptotic mean of v; is (p;—p,;)./n and the variance is 


E Prvyds|(1—P)P+ Psy MslP; = 295/11 —P) + Py AylPs 





14) 





Leo A. GOODMAN 189 
where g; = 1—p, and q;; = 1—p,;. If the null hypothesis is true, then P; = ¥ p;;P; = p;, 


and the mean is 0, the variance is p;+q,; = 1, and the asymptotic distribution of v; is the 
unit normal. For any alternate hypothesis p,;+p,;, the asymptotic mean of v; approaches 
infinity as ,/n-> 00, while the variance remains a constant that depends on the values of p;; 
and p;. 

We also have that 9, ~ [kj —thy(n—n,)/n,] /n/(n—n,) 

~ [k;— (nj —k;) (n—n;)/nj] {n|(n— n;) 

= [k;—n,(n—n,)/n] ni/[n,(n—1,)], (15) 
where k; is the number of groups of observations in state j in the sequence, and k;~ > j;. 

t+j 

This result concerning the asymptotic distribution of [k,;—n,(n—mn,)/n] n3/{n,(n—n,)}, 
when the null hypothesis is true, is similar to, although not identical with, an asymptotic 
distribution in the theory of runs (see Mood, 1940, p. 383), and the method of proof is 
somewhat different. The distribution theory of runs discussed in Mood (1940) deals only 
with the hypothesis of randomness and does not consider the distributions obtained when 
other null and alternate hypotheses may be true, while the asymptotic distributions for 
certain kinds of alternate hypotheses have been given herein. The variables m;;, which appear 
in the proof of Mood’s Theorem 6-2, p. 384, but which are not given any statistical inter- 
pretation there, are seen to have the same asymptotic distribution as that obtained, under 
the hypothesis of randomness, for the transition numbers n;; considered here; i.e. the 
asymptotic variances and covariances given by Mood in (6-11) are of the same form as 
obtained for the related statistics, where m;; are replaced by ,;, and the simplified approach 
presented herein is applied. Thus, we have seen herein that the run test based on the number 
k; of runs of observations appearing in state j is applicable as a test of the null hypothesis 
that p;; = p,; for all 7, and the alternate hypothesis is that p,; = p; (unspecified) for all 
i+j and p;;+p;. 

Since j is given, the null hypothesis is not the same as the hypothesis of randomness. 
The hypothesis of randomness is, in a sense, more restrictive than the null hypothesis con- 
sidered here, and the result proved here, that v; is asymptotically unit normal when p;; = p;; 
for all 7, is a stronger result than when proved under the hypothesis of randomness. Thus, in 
this sense, the asymptotic result given herein, when the null hypothesis is true, is a stronger 
result than that obtained in the standard distribution theory of runs. If the hypothesis of 
rendomness is, in fact, true, then the null hypothesis considered here will also be true, and 
the asymptotic distribution obtained for this null hypothesis will also hold in the case of 
randomness. The test based on v; is a long-sequence group test for a generalization of the 
hypothesis of randomness considered by David (1947) for the case where s = 2. 

We shall now consider a generalization of the hypothesis discussed at the end of the 
preceding section. Let us now assume that, for al! 3, p;; = p (unspecified). The null hypo- 
thesis to be tested is that p = 1/s. A large sample iest of this hypothesis is obtained by 


computing u = [Em,Pyln— Velo yfnl(e—1)] (16) 


where f,;; = n,,;/n;. From the asymptotic distribution results presented in §3, we have that 
u~[D J (nj) 25; -—n]/8 + D n;p;;]8/,/[n(s — 1)]. The asymptotic mean of w is (sp — 1),/n//(s — 1) 
j j 








190 Simplified runs tests and likelihood ratio tests for Markoff chains 
and the variance is >) P,p(1—p) s?/(s—1) = p(1—p)s?/(s—1). Under the null hypothesis, 
j 


the mean is 0, the variance is 1, and the asymptotic distribution is the unit normal. 


Since 
UN; Pi ~ DN ~n-D YD ny~n—-Lh,~n—-k, 
j j j i+j j 


where k is the observed number of groups, we see that 


u~ (n—k—n/s)8/,/[n(s—1)] = [n(s—1)/s—k]s/y/[n(e— 2)}. (17) 


It should be noticed the test based on wis not atest of randomness, since the null hypothesis 
is simply that p;; = 1/s for allj. In the case where s = 2, the null hypothesis is that p,; = 0-5 
for alli, 7, and the alternate hypothesis is that p,, = Po9. The hypothesis of ‘ perfect’ random- 
ness (i.e. p,; = 1/s for alli, j) is, in a sense, more restrictive, for s > 2, than the null hypothesis 
considered here, and the result proved here that u is asymptotically unit normal when 
Pj; = 1/s, for all j, is a stronger result than when proven under the hypothesis of ‘perfect’ 
randomness. Thus, the test based on u is a long-sequence group test for a generalization of 
the hypothesis of ‘perfect’ randomness, which is related to the hypothesis of randomness 
considered in the particular case discussed by Moore (1953). 

We have seen that the number k; of groups of observations in state j in the sequence is 
approximately equal to ¥ n;p;;, where the p,; = n,,;/n; are maximum likelihood estimates of 

i+j 


the first-order transition probabilities in a Markoff chain, and that the asymptotic distribu- 
tion theory for the p;; could be used to obtain results relating to the k;. A similar remark 
applies to the number k of observed groups in the sequence. It can also be seen that the 
number ,k; of groups of observations of length 7 in state j can be approximated by a function 
of the maximum likelihood estimates of the (i+ 1)th order transition probabilities in a 
Markoff chain (see Anderson & Goodman (1957)). For example, ,k;~ ¥ P Nig ~ Ni; Pig 
i+j 145 i,l+j 

Thus, the asymptotic distribution theory for the estimates #,,,,of the second-order transition 
probabilities can lead to results related to the number of groups of observations of length 
one in state 7. The general approach presented in this paper can be used to obtain results 
concerning asymptotic distributions relating to the ,t; under various null and alternate 
hypotheses. 

Let us now assume that, for all i, p;; = p (unspecified) and also that, for all i and j where 
t+), Pi; = g (unspecified). Then p + (s—1)g = 1, and the probability of obtaining a specified 
sequence from the Markoff chain, when the initial state is fixed, is 


TI TI pi? = p7ig™ 7 = gk-tp™-* = (1—p)t-t p-*(s—1)1-*, 
. 2 


Thus, in this case, k is a sufficient statistic for p, and the maximum likelihood estimate for 
pis (n—k)/(n—1)~(n—k)/n. The likelihood ratio statistic for the null hypothesis that p = 1/s 
against a simple alternate hypothesis that p is a specified value + 1/s, is of the same form 
as the standard likelihood ratio obtained in testing the null hypothesis that a sample of 
n—1 observations came from a binomial population with parameter p = 1/s against the 
simple alternate hypothesis that the parameter was a specified value p + 1/s. Tests based on 
the statistic w (which is a function of k and n) are of the same form as standard large sample 
tests of the null hypothesis that the parameter in a binomial population is p + 1/s, and these 
tests are asymptotically equivalent (under the null hypothesis) to the corresponding likeli- 





Th 


esis, 





Leo A. GooDMAN 191 


hood ratio tests. Thus, for the case considered in this paragraph, the statistic u is justified 
as a basis for a large sample test of the null hypothesis that p = 1/s. If some simple null 
hypothesis other than p = 1/s is of interest, u can be modified accordingly, and comments 
analogous to those presented here will continue to hold true for the modified w. 

The comments in the preceding paragraph are closely related to the work of Barton & 
David (1957) (see p. 174 and Corrigenda) where they state that the use of k is equivalent to 
the likelihood ratio test of the null hypothesis that 6 = 0 in the case where p,; = (1+)/s 
for all j, and p,; = [1 —0/(s—1)]/s for all ¢ and j where i+j. However, the discussion in the 
present paper has lead to the suggestion that large sample tests based on the statistic wu 
are appropriate to test the null hypothesis that p = 1/s (i.e. 0 = 0), while Barton & David 
suggest implicitly a quite different statistic: viz. v* = (n—k—F,/n)/o,, where F; = rj? 
and . 

F(n—3) F} F, 
n—1 ‘n%(n—1) ~n(n—1) 





The statistic v* can be seen to be asymptotically equivalent to 
(n— » n/n —k)/o;, = [Lnj(n—n;)/n—k]/o,, = —Dvjn(n—n,;) n-4/o0,,. 
I 5] 


Thus, »* is asymptotically equivalent to a weighted sum of the v, statistics. Some justifica- 
tion for the use of large sample tests based on »; (for a given value of j) was presented earlier 
in this section for some specific null and alternate hypothesis, and this approach can also 
be used to determine the asymptotic distributions of v* under both these null and alternate 
hypotheses when they hold true for all j. (Barton & David (1957, p. 171) give the asymptotic 
distribution under the assumption of random arrangement on a line.) It should be men- 
tioned, however, that no justification has been presented in the present paper for the use of 
v* (i.e. this particular weighted sum of the v;) as a large sample test of any specified null and 
alternate hypotheses; and the implicit justification of its use given by Barton & David 
would apply as strongly (if not more strongly) to the statistic wu as it does to the statistic v*. 
Their implicit justification of v* is based on their study of the conditional probability of 
obtaining a specified sequence from a Markoff chain, for a given composition of numbers 
n; 6 sach type. Since, as we saw in the preceding paragraph, the probability of obtaining 
a pecified sequence, for the particular hypotheses under consideration, depends only on 
k, n, and p (i.e. (1+8)/s), and not on the values of n,; (i = 1, 2, ...,8), and a fortiori the likeli- 
hood ratio does not depend on the n;, there does not seem to be any need to study the con- 
ditional probability of obtaining a specified sequence, for given values of n,. If the (un- 
conditional probability of obtaining a specified sequence is studied, as was done in the 
preceding paragraph, and if the (unconditional) distribution of the likelihood ratio is used, 
then justification is found for large sample tests based on the statistic w, but not for tests 
based on the statistic v*, for the particular hypotheses under consideration. 


8. The group tests presented in the preceding section were for the hypothesis of random- 
ness (or perfect randomness), and for generalizations of this hypothesis, against some specific 
kinds of alternate hypotheses. We shall now not restrict ourselves to these specific kinds of 
alternate hypotheses. 

It will be convenient to deal here with a cyclic sequence of observations, as well as with 
the sequence of observations considered earlier. Associated with every sequence © of obser- 


vations is a corresponding cyclic sequence © defined by regarding the first element of © as 








192 Simplified runs tests and likelihood ratio tests for Markoff chains 


immediately following its last one. We shall denote properties of S by placing a bar over the 
corresponding algebraic symbol relating to ©. 

Let us first consider the null hypothesis of ‘perfect’ randomness (p,; = 1/s, for all i, j) 
against the general alternate hypothesis that p;;+ 1/s for some i, j; the usual! definition of 
‘perfect’ randomness also assumes that the chain is stationary, but we shall not be con- 
cerned with this here. From the results given in § 3, we see that, under this null hypothesis, 
the asymptotic distribution of 


G,(t) = ;, X (Pi — 1/8)?/(1/s) = 8D zi; = D 245/Pij 
I ] ¥ 
will be a x?-distribution with s — 1 degrees of freedom, and G, = ¥ G,(7) will have an asymp- 


totic y?-distribution with s(s — 1) degrees of freedom. " 
It can be seen that, under the null hypothesis, G, = ¥ G,(i) = 57; D (p,;—1/s)*s will 
i i j 


% A 
have a x-distribution with s(s — 1) degrees of freedom, where p,; = 7;;/7;, %,; is the number 
of direct observed transitions from i to j in S, and, =”, => N;; = > 7;; is the number 
j j 


of observations in G which are in i. We have that 
G, = DD (ny; — n,/8)? 8/7; 
> 2 


= XD [(m;5—n/s°)? — (1/8)? (w; —/8)?] 8/7, 
i 


= © (Mj —7/s°)? 8/0; — X(N, —N/s)?/M,, (18) 
ij i 
where 7 = })7; = n. Under the null hypothesis, 7,/n will converge in probability to 1/s 
and thus“ G,~¥ Gy—n/s%)Pet|n—¥ (H,—n/o)¥9[n = P- Vi = VA 
ij i 
where ¥? = Dd (nm, —n/s")? 8”/n, (19) 


r 


and 7, is the number of v consecutive observations in the sequence © which are 
("1 To, ee =P 


(see Good, 1955). Thus, we have shown that, under the null hypothesis, the statistic Vi, 
which was considered by Good (1955), is (for long sequences) essentially the sum of 8 
independent y?-goodness-of-fit statistics G,(i) (; = 1,2,...,8), which are computed in the 
standard manner as though the 7;;, for fixed 1, where the observations in a sample of size 
n, from a multinomial population, and the null hypothesis was that of an equiprobable 
multinomial population; i.e. p,; = 1/s. A similar statement can be seen to hold true for 
V3, which is computed from G, rather than 6. 

For long sequences, the likelihood ratio test of the null hypothesis of perfect randomness 
is based essentially on the statistic 


L, = DL,i) =~ az n,; log [ P;;8]}, (20) 
v t 
which is asymptotically equivalent to G, ~ Vy3 under the null hypothesis. We have that 
ij i 





If J 


fi 


(18) 


) 1/s 


(19) 


Vi. 
of s 
| the 
' size 
able 
> for 


ness 


(20) 


at 





Lro A. GOODMAN 193 
If L, is replaced by L,, we have 
L, = 2D7,,log7,;-2 D7, logn,; + 2nlogs 
i 


i,j 
= K,-—K,+2nlogs = VK, +2nlogs, (21) 
where K, = 27, log, for v = 1 (see Good, 1955). Using the terminology in Good (1955), 
r 


we are testing the null hypothesis of perfect randomness, H_,, within the alternate hypo- 
thesis of a first-order chain, H,. Good states that the likelihood ratio test for: H_, within H, 
is based on the statistic VK,—VK,, where he defines Ky = 2(n—v+1)log(n—v+1), 
K, = 2nlogn, and’K_, = K_, = 0. This does not agree with the result presented here, and 
Good has agreed that his paper contains a number of inaccuracies; see the errata to his 
paper, where K_, is redefined as 2nlogns rather than 0. We have found that, when 
K_, = 2nlogns and K, = 2nlogn, then VK, = —2nlogs and L, = VK,—VKg. Also, for 
the non-cyclic (i.e. non-circularized) sequence, we have found that 


L, = K,—2%n, logn; +2(n—1) logs, (22) 


rather than K,—K, — (K,—K_,). Thus, the result given by Good for non-cyclic remains 
incorrect, while the result for circularized sequences has been corrected. For the cyclic 
sequence, the statistic Z, can be written simply as VK,—VXK,, while a corresponding 
statement for L, is not correct in the case of the non-cyclic sequence. 

There seem to be some further inaccuracies concerning the non-cyclic case. Good states 
that VK,—VK, = V?K, is a special case of the likelihood ratio test for contingency tables. 
In other words, K, — K,—(K,—K,) should be equal to 


23 n,; log [P;;/B;] = 2 > nj, logn,;—2 Yn; logn, —2 yn ;logn_;+2(n—1) log(m—1); (28) 
ij i i j 
this does not seem to us to be the case. However, for the cycle sequence, the statement is 
true; i.e. V2K, = 25 7;; log (p;;/p;). Also, Good states that V°K, is the likelihood ratio test 
for perfect ssiibidiiiaias against randomness in general. In other words, 
V2K, = K,—K,)—(Ky—K_,) 
should be equal to 
2¥ n, log (P;8) = 2 Yn; log n; —2nlogn+ 2nlogs = K,—2nlogn+ 2nlogs; 
i i 


this does not seem to us to be the case, unless K, = K, and K_, = K_, = 2nlogns. If 


K_, = 2nlogns, the statement is also true for the cyclic sequence; i.e. 
V2K, = 27; logn; — 2nlogn+ 2nlogs. 
i 
For the non-cyclic case, it is not possible to equate directly K, = 2 3) n,logn, with either 
2di;,log n;, or 2>>n_,logn_,; and also the correct definition of the esti K, and K, 
i 


must vary with the test being considered. 

It can be seen that, for long sequences, L, (and also L, = VK,—V&,) is, under the null 
hypothesis of perfect randomness, essentially the sum of s independent statistics L(i), 
which are computed in the standard manner as though the n,,, for fixed 7, were observations 
from n; multinomial trials. The statistics L, and L, are asymptotically equivalent, under 

13 Biom. 45 








194 Simplified runs tests and likelihood ratio iests for Markoff chains 


the null hypothesis, to G,, which has an asymptotic x?-distribution with s(s— 1) degrees of 
freedom. These statistics are also asymptotically equivalent, under the null hypothesis of 
perfect randomness, to V3, and any of these statistics can be used to test this null hypo- 
thesis against the alternate hypothesis H,, when the observed sequence is long. 

More generally, consider new the null hypothesis of perfect randomness, H_,, within the 
alternate hypothesis H, that the chain is of the vth order. In other words, the alternate 
hypothesis is that the sequence of observations is from a Markoff chain with transition 
probability matrix defined in terms of the transition probabilities p,; that the variate takes 
the value j at time ¢, conditional on the values having been 1,7», ..., 7, (i.e. r) at times t—p, 
t—v+l,...,¢—1, respectively (see Bartlett, 1951), and the null hypothesis is that p,; = 1/s 
(for all r, j), where s is the number of states. Using an approach similar to that applied by 
Bartlett (1951) for vth order chains, it is possible to generalize the results presented earlier 
herein. Thus, under the null hypothesis, the asymptotic distribution of 


G(r) = MX (Pej — 1/s)?s = D 255/Prj (24) 
j 


will be a y?-distribution with (s — 1) degrees of freedom, and G, = 5 G,(r) will have an asymp- 
r 


totic x?-distribution with s’(s— 1) = Vs”*! degrees of freedom. 
It can be seen that, under the null hypothesis, G, = > G,(r) will also have an asymptotic 
rt 


x?-distribution with s’(s— 1) = Vs’+! degrees of freedom. We have that 
G, = 2 > (7g — 2/8)? 8 [Mey = ~ ~ [ (7,3 — m/s”**)® — (1/8)? (nm, — n/8”)?] 8/7, 
= ~ = (7,; —n/s’*1)? s/n, — = (n, —n/s”)?/n,. (25) 
Under the null hypothesis, 7,/n will converge in probability to 1/s’, and thus 
G,~ ~ ~ (m,; — n/8”*7)? 8+ /n — = (™— n/s?)? 8" |n~ Wis — Ws = VWs (26) 


(see Good, 1955). Thus, under the null hypothesis, the statistic Vy2,, is (for long sequences) 
essentially the sum of s” independent y2-goodness-of-fit statistics G,(r), which are computed 
in the standard manner as though the 7,,, for fixed r, were the observations from 7, multi- 
nomial trials. A similar statement can be seen to hold true for Vy?,,, which is computed 
from G, rather than 6. 

For a long sequence, the likelihood ratio test is based essentially on the statistic 


L, = = L,{r) = D> {2 * Ng; log [P,;8]}; (21) 
r r 
which is asymptotically equivalent to G, under the null hypothesis. We have that 
L, = 2D ¥ n,; log n,; — 2 ¥ n, log n,, + 2(n— v) log s. (28) 
rj r 


Again, we find a difference between the result presented here and that given by Good (1955). 
However, when the cyclic sequence is used, we have 


L,=~ d7,,; log X,; — YN, log N, + 2n log s 
rj rt 


te |e —K,-VK, a VK,41—VKo. 





— 42 —_—-S — 


es of 
sis of 
Ly po- 


n the 
rate 
ition 
takes 
t—y, 


= I/s 
d by 
urlier 


(24) 


ymp- 


totic 


(25) 


(26) 
1ces) 
uted 


ulti- 
uted 


(21) 


(28) 


155). 





Lro A. GOODMAN 195 


It can be seen that, for long sequences, L, (and also L, = VK,,.,,—VK,) is, under the null 
hypothesis, essentially the sum of s” independent statistics L,(r), which are computed in 
the standard manner as though the n,,, for fixed r, were observations from n,, multinomial 
trials. The statistics L, and L, are asymptotically equivalent, under the null hypothesis, 
to G,~Vy?,,, which has an asymptotic y?-distribution with s’(s—1) = Vs’+! degrees of 
freedom. (Thus, in the statement by Good, 1955, that ‘Vy? is the asymptotic form of 
VK,..,— VK, when #_, is true’, the Vy? should be replaced by V2, ,; see the errata to Good’s 
paper. Also, the K’s should be replaced by the K’s.) 

The preceding results in this section were concerned with the null hypothesis of ‘perfect’ 
randomness, H_,. Now let us consider the null hypothesis of randomness (i.e. independence 
of successive observations) H), against the alternate hypothesis H,. Under the null hypo- 
thesis that p,; = p; (unspecified) for all 7, the asymptotic distribution of the statistic 


Fyo= Xm, B (Pig BP; =22 (2jj— 1,2, 5/n__)?/(m;.M,j/n_,) (29) 
t » @ 


(where n_ = n—1), which is similar to the statistic used as a test of homogeneity of pro- 
portions in s independent samples or a test of independence in an s x s contingency table, 
will be a x?-distribution with (s— 1)? degrees of freedom (see Anderson & Goodman, 1957). 
Also, the likelihood ratio statistic, 


Myo = 2 X nj; log (Di/P;) = 7m nj; logn;—2 n, logn,, 
7] t 4 
—2 2m slogan. 5+ 2(n—1)log(m—1), (30) 


is asymptotically equivalent to F, ) under the null hypothesis. Using the cyclic sequence, 
F,,) is asymptotically equivalent, under the null hypothesis, to F,,, and also to M,,y, which 
is equal to 


7] j 


The articles referred to herein, which discuss tests of the null hypothesis H, of randomness, 
assumed (either explicitly or implicitly) that all probabilities p; were positive; we have done 
likewise in the preceding paragraph. Also, when indicating the number of degrees of freedom 
for some statistics (which were asymptotically ”) relating to tests for certain null hypotheses 
concerning Markoff chains, these articles assumed (usually implicitly) that all the transi- 
tion probabilities in the Markoff chain were positive. For the sake of simplicity, we shall 
do likewise here when indicating the size of certain contingency tables (and thus the number 
of degrees of freedom for the x? statistics corresponding to these tables). If some of these 
probabilities are zero, then the methods developed in the present paper can be modified 
ina straightforward manner to obtain analogous results (see, for example, Bartlett (1951)). 

Consider now the null hypothesis H,, within the alternate hypothesis H, (0< <p). Let 
1x3), (g) be the number of consecutive observations in the sequence of length v which «re 


(71,1) ---s7,) = (r(1), r(2)) =r, 
where T(1) = (ry, %2,-+-5%)—,) and £(2) = (twas Ty —ptos --+9%) 


(when « = 0, r(1) = r and the symbol r(2) can be neglected). The alternate hypothesis is 
that the sequence of observations is from a Markoff chain with probability transition matrix 
defined in terms of the transition probabilities p,; = piq)./2); that the variate takes the 


13-2 








196 Simplified runs tests and likelihood ratio tests for Markoff chains 


value j at time ¢t, conditional on the values having been 7,79, ..., 7); Ty—p+1 +++» 7, at times 
t—v,t—v+l,...,4-w—1,t—yp,...,t—1, respectively, and the null hypothesis H, is that 


Piv,x2); = Px»; (unspecified) for all r(1). (31) 
Let New), 12), = X M21) x2), j= N%,,, (32) 
] 
N 12)5 = = tne, x(2)), j (33) 
and N49, = UN. 95 = 2 Me, 12). (34) 
j r 


Then, under the null hypothesis H,, the asymptotic distribution of the statistic 
F,, At (2)] = 2 Mx), x12). ~ [Pecw, 123 — Prt 5]?! Perey 3 
= a ~ (Men, x(095 — Met, 10. .x() 9/1. P/[ ecw, oy. %.x105/%.@.], (35) 
rT 


which is similar to the statistic used as a test of homogeneity of proportions in s’-“ in- 
dependent samples or a test of independence in an s’-“ xs contingency table, will be a 
x?-distribution with (s’-“ — 1) (s— 1) degrees of freedom (see Anderson & Goodman, 1957). 
Also, the statistics F, ,[1r(2)], for different sets of r(2), will be asymptotically independent, 
and the asymptotic distribution of the sum of these s” statistics, F,, , a yall (2)], is a 


x*-distribution with s“(s’-“ — 1) (s”— 1) = (s”— 8“) (s— 1) = Vs’+!— Vs¢+1 degrees of freedom. 
The likelihood ratio statistic, 


M,,=2% y > Mea), x25 LOG [Pecw, x2» 3/Pro 4] 


r(1) r(2 
= 2) dn, logn,;—2>¥ n, logn,.—-2¥ ¥Y xy; logn «yj; +2 UN np, logn xp, (36) 
ge j r (2) j 1(2) 


is asymptotically equivalent to F, , under the null hypothesis. Using the cyclic sequence, 


F,,, is asymptotically equivalent, under the null hypothesis, to F, ,, and also to M, w Which 
is equal to 


22 Bites seh SEs hen ~2 = Uy; OB My; +2 Py Nay) log Neg) 
r 


= K,,,—K,-(K,4-K,) - VK 41 — VK y41- (37) 


It can be seen that, for long sequences, M, , = VK,,,—-VK a+ = L,-L, is, under the null 
hypothesis, essentially the sum of s“ independent statistics M, ,[r(2)], which are computed 
in the standard manner for testing the homogeneity of proportions as though the qq), 42); 
were observations in s’-“ independent samples of size 7igq) (2) (¥(2) is fixed) from multi- 
nomial populations. 

We have that 


nN nN 

Fn =) XX TR, cog — Mew, xn/8? / ED A} \|- x = (Riga 3 —Myo/8)?/Myn;- (38) 
r(2) e(1) J Nx(2) (2) j 

When #_, is true, then 7) ;/Ny2, converges in probability to 1/s, and 


pd G, rT G, es Vas = Vins 





o 4d lk Se oO 


nes 
hat 


31) 
32) 


33) 


34) 


38) 





Lro A. GoopMAN 197 


From this discussion, we see that the strong analogy, which is mentioned by Good (1955), 
between the expressions VK,,,—VK,,,, and Vy¥>,, — V4; seems to be related to the fact 
that, if the null hypothesis H, is true, there is a strong analogy between F, ,,, M, »» F, , and 
M,,, = VK y41 —VK z+1; and when #H_, is true, these expressions are asymptotically equi- 
valent to G,—G,~Vy5i1—Vvi41. When H_, is not true, the present writer does not see 
a strong analogy between the expressions VK,,,—VK,,,, and Vy, —Vyiiss- 


REFERENCES 


AnpErson, T. W. & Goopman, LrEo A. (1957). Statistical inference about Markov chains. Ann. 
Math. Statist. 28, 89-110. 

BartLeTtT, M. S. (1951). The frequency goodness of fit test for probability chains. Proc. Camb. Phil. 
Soc. 47, 86—95. 

Barton, D. E. & Davin, F. N. (1957). Multiple runs. Biometrika, 44, 168-77 and ‘Corrigenda’, 
ibid. 534. 

Davin, F. N. (1947). A power function for tests of randomness in a sequence of alternatives. Bio- 
metrika, 34, 335-9. 

Goon, I. J. (1955). The likelihood ratio test for Markoff chains. Biometrika, 42, 531-3. ‘Corrigenda’, 
Biometrika, 44, 301. 

Hoet, P. G. (1954). A test for Markoff chains. Biometrika, 41, 430-3. 

Moon, A. M. (1940). The distribution theory of runs. Ann. Math. Statist. 11, 367-92. 

Moors, P. G. (1953). A sequential test for randomness. Biometrika, 40, 111-5. 

Waitt te, P. (1955). Some distribution and moment formulae for Markov chains. J.R. Statist. Soc. 
Series B, 17, 235-42. 

Wits, 8. 8S. (1944). Mathematical Statistics. Princeton University Press. 








[ 198 ] 


MOMENT GENERATING FUNCTIONS OF QUADRATIC FORMS 
IN SERIALLY CORRELATED NORMAL VARIABLES 


By ROY B. LEIPNIK 
Mathematics Department, University of Washington, Seattle 


1. SUMMARY 


A method used by Kac in the study of Wiener functionals is adapted to the problem of 
calculating in closed form the joint moment generating functions of linear combinations of 
quadratic forms (not simultaneously diagonable) in serially correlated normal variables. 
A class of Gaussian processes is found for which this method is successful. 

The results are worked out in detail for the special case of the Uhlenbeck—Ornstein process, 
which includes first order stochastic difference equations with constant coefficient p. 
Exact moments of several estimates of the variance and autocorrelation are studied. 
Asymptotic results as the number n+ 1 of observations oo and the interval h between 
observations 0 are derived under various assumptions on the limit of 7’ = nh. 


2. INTRODUCTION 


Testing for serial correlation depends fundamentally, as pointed out by Koopmans (1942), 
on the joint distribution of quadratic forms which cannot be simultaneously diagonalized. 
The resulting complications have led investigators to introduce various simplifications and 
approximations in order to cope with this distribution problem. These are 

(a) the circular problem (R. L. Anderson, 1942; Leipnik, 1947; Quenouille, 1949); 

(6) test for zero correlation (Von Neumann, 1941; Koopmans, 1942; Quenouille, 1949; 
T. W. Anderson, 1948); 

(c) regression on selected observations (Ogawara, 1951); 

(d) omission of selected observations (Durbin & Watson, 1951); 

(e) approximate distributions (R. L. Anderson, 1942; Koopmans, 1942; Rubin, 1947; 
Leipnik, 1947; Daniels, 1956; Jenkins, 1956). 

Daniels (1956) has recently obtained the exact characteristic function of several quotients 
of quadratic forms in normal variables by the use of a difference equation technique for 
calculating determinants. This method is well adapted to finding asymptotic expansions 
for the distributions. His results are closely related to some of ours. 

A general theory of estimation in stochastic processes has been constructed by Grenander 
& Rosenblatt (1952, 1953, 1954) in a series of papers, and they have derived important 
asymptotic results on the moments of quadratic forms used in estimating the spectral 
density of a stationary time series. Parzen (in an unpublished paper) has generalized their 
approach to stationary continuous processes. 

For a special class of processes, Rubin and Savage (see Rubin, 1947) have proved that 
certain quadratic statistics converge in probability to process parameters as n > 00 and h > 0. 

None of these results, however, answers the question of the joint distribution of the 
quadratic forms in the unmodified process. 

Kac (1946) has succeeded in calculating the moment generating functions of some very 
interesting Wiener functionals by using the theory of eigen solutions of symmetric integral 





8 


2), 
ed. 
nd 


49; 





Roy B. Lerenrtk 199 


operators. In this paper, we adapt the method of Kac to our problem. A generalization of 
the Kac method, which may have an independent interest, is included. We then obtain an 
explicit closed form for the joint moment generating function of linear combinations of four 
quadratic forms. These statistics include those commonly used in the estimation of serial 
parameters. Moments are, of course, then obtainable. We work out these results in detail 
in the case of the Uhlenbeck—Ornstein (1930) process, and write down the first two moments. 
Finally, we derive some asymptotic results on the behaviour of the estimates as n > 00, h>0. 

The distributions of quotients can be also obtained from the above by use of the inversion 
formulas of Gurland and others. The extremely complicated form of the result suggests 
that the methods of Daniels (1956) and Jenkins (1956) which lead quickly to asymptotic 
expansions are better suited to this application. However, our asymptotic results indicate 
that the squared successive difference estimator of the coefficient p due to Von Neumann 
is in certain respects superior to the quotient type of estimator. 


3. A CLASS OF SERIALLY CORRELATED PROCESSES 


The method of Kac can be successfully applied to an interesting class of processes, which 
includes some non-Markov and non-stationary processes. A process X(t) will be called 
serially stationary in case 

(a) m(t) = E[X(t)] = 0, 

(b) there exist functions a(h) and v(h) such that E[(X(t+h)—a(h) X(t))?] = v(h) for all 
tand all h>0. 

(c) the linear combinations 

X (te) a A(te oc ty) X(t), ery X(t,) se a(t, vg tn—1) X(tn-1) 
are uncorrelated whenever ¢, <t,<...<t,. 

Let r(t,,t.) = H[X(t,) X(t.)] be the autocovariance function of the process X(t). Analytic- 
ally, three distinct possibilities exist for a(h), v(h) and r(t,, t,) for serially stationary processes. 
These are 

(I) a(h) = 1, v(h) = o°h, r(t,,t.) = o? min (t,,t,) + A(t) + A(t,), where A(t) is arbitrary. 
(II) a(h) = p*, o(h) = 0(1—p*), r(t,, ta) = o%p2-4 + p'v*4(A(t,) +A(t,)), where p+0, 1, 
and A(t) is arbitrary. 

(IIT) a(h) = 0, v(h) = 0, r(t,,t,) = o7d(t, —t,). 

(I) is essentially the Wiener process which represents classical Brownian motion. It is 
non-stationary, and it is Markov if and only if, A(t) = const. or A(t) = const.—o*t. The 
process of type (II) is stationary if and only if A(t) = 0, or A(t) = const. p~*, and Markov 
if and only if A(t) = const. or A(t) = const. —o%p-*. For A(t) = 0, it is a type met frequently 
in applications; when Gaussian, it is called the Uhlenbeck—Ornstein process. The type (ITI) 
process is stationary and uncorrelated, and is often called white noise. Since it is uncorrelated, 
we do not consider it further except as a limiting case of (II). 

There is no difficulty in showing that the processes with a(h), v(h), and r(t,,t,) as above 
are in fact serially stationary. The converse is true also. However, deduction of expressions 
of the above three types from the descriptive definition is more difficult and involves 
considerable manipulation of functional equations, for which there is insufficient space here. 

The most interesting subclass of the serially stationary processes is that for which there 
exists an « = a(p,t,,t,,,) such that 

(d) X(t,)—oX(t,,,,) is uncorrelated with X(t,,,)—a(t;,, —t;) X(t) forj = 1,2,...,n. 








200 Moment generating functions of quadratic forms 


The values of « and the corresponding differentiabie functions A(t) determined by the 
additional requirement (d) are 





A(t) = wr +A, a+1, type I, 
Ko*p* re 
A(t) = se +A, a=kp-men, type II. 


We distinguish the non-circular case, where a = 0, A(t) = A = const. and the circular 
case, where «+0. 

This terminology is chosen because the non-circular case is closely related to the usual 
non-circular time series, and the circular case is a generalization of the circular time series 
introduced by Hotelling to simplify the distribution problem. The non-circular cases of 
both types are Markov, and the circular cases are non-Markov. Type I processes in either 
case are non-stationary, and type II processes in either case are stationary. 


4. JOINT DISTRIBUTION OF THE PROCESS 


We assume henceforth that X(t) is a Gaussian, non-circular, serially stationary process. 
(In a later paper, we hope to consider the general circular case, which is not in all respects 
easier than the non-circular.) Let us now obtain the joint distribution of X(¢,), ..., X(t,41) 
for t, <t,<...<t,.1. 

It is convenient to define 


nulla) = BIX(4)4) = { 
By our basic assumptions, we have 


n -4 
Pr[X(t))<y, X(t_)—a(t,—t,) X(4) < yo, ...] = (2ar)-2m+) (out AL (ths - t)) » (1) 


ot [exp | exp be : Wha | du. _ 
- 0% ee Pxlt 1) k=1 (ths — ty :) " 
The matrix of the transformation from 
X(t), X(t_)—a(tg—t) X(t), +) X (Engr) — tng — bn) X(t) 
to X(t), X(t), ---, X(t) 


has zeros above the main diagonal, and ones on the diagonal. Its determinant, the Jacobian 


o7t,+2A, typeI, 
o?+4+2Ap%4, type II. 








of the transformation, is therefore equal to 1. Hence, the joint distribution F, |. ,,,, of 
X(t), ---» X(tr41) is : 
n a 
Fy aes tng Ens + +5 Uy4y) = (277)- hint) (ot) i u(tesa~ty)) . (2) 
% Tn+1 
Pr me M (81, 89; «++ 84.1) O8; ... U8n 44, 
—o -0o 
8? ® (8 a(t ty.) 8)” 
where M(8,, 80, «-+5 8,41) = @X {- 5s a TR ehh, 3 
( p23 n+1) Pp 2v,(t) 1 20(tes1 —ty) ( ) 


We say that the process is sampled at a constant rate if t,,,—t, = h>0O fork = 1,2,...,n 
We then have ¢t,,,, = t, + kT'/n, where T = nh is the length of time over which the process is 





y the 


cular 


isual 
eries 
es of 
ither 


cess. 
ects 


tn4a) 


- (1) 





Roy B. Lrerpnik 201 
observed. It is customary and convenient to assume a constant sampling rate, and we make 
this simplification henceforth. The normalizing constant in (2) then becomes 

(27)-¥™+ (v,(t,) v(h)")4 
and the integrand M becomes 


s?/ 1 a?(h) n s? (1+a(h) < a(h) % 
ee a am \-Slaay* a eae ) sea) Toa sseeten]- 


n 
Note that M depends on the four quadratic forms sh Sk st wd 8,8;,44- Lhe usual treat- 








ment of quadratic forms in mathematical statistics — on simultaneous diagonability. 

It is easy to show that the only linear combinations as? + b E si. + cs?_,, simultaneous diagon- 
k= 

able with D S18 4, are those for which a = b = c. Referring to (4), this condition becomes 


in the ia situation a(h) = 0, v,(t,) = v(h). Wesee that the exponent in M can be diagon- 
alized when and only when p = 0, A = 0. This possibility has already been e ploited by 
Koopmans (1942) and Von Neumann (1941) in testing the hypothesis p = 0. The motive 
for the circular case is also, of course, simultaneous diagonability. Fortunately, the Kac 
method enables us to dispense with this. 


5. MoOMENT-GENERATING FUNCTIONS 


The statistics proposed for the estimation of p and o? in the non-circular case (except for 
those of Durbin & Watson, 1951) have been functions of the quadratic forms 


P,=X%b), P= EX) B= X€yaa Q= EXty) Xlear (5) 
We see from the form of the a, acs M that these are the most natural. 
si i« Dey reu iy Babe (6) 
where the ¢;; are arbitrary real initiate We wish to calculate the joint moment generating 
function $ 
©(0,, 0», ...) = | exp(- > 0.B.)| is zl exp(— yx P,—w0)|, 
where Z,= De, (J=1,2,3) and W= 5 A; €;4- (7) 
Clearly ® is given by 
©(0y,0p)--.) = Kone f Miles Snaa) Mpa (8) 
where K = (27)-#"+ (v,(t,) o(h)")4, (9) 


n n 
M, (8), «+1 8942) = M(8;, ..-5 8943) OXP ( —2,8}— 2D Sh — 238%, 44— w3 S441) . (10) 


The possibility of applying the method bes Kae rests on factorizing M,(s,, ...,8,,,) into 
the form Fe padi pea ala (So, 83) ... K,,(8,,8n41). In the present case, this can be 


n? 


done with K,(x, y) = K(x,y) = K(y, x) (= = 1,...,n) and K,(x, y) = J(x) L(y), where 
J(x) = exp{— wilh 
K (x,y) = exp {—¢,2? +daxy — cay", (11) 
L(y) = exp {—c3y*}. 








202 Moment generating functions of quadratic forms 


The coefficients ¢,, c2, cs, d are given in terms of a(h), v(h), v,(t,), 21, 2, Z3, w by 














meee. 1 a(h)—1 } 
1-419 Ou) 40h)” 
_ @ , a(h)+1 
“2 OT 40(h) ” 
1 (12) 
_ 2, 1—a*(h) 
dee ee 
o a(h) 
d =—-—w+ oh)’ 
Thus we have My (84, -++58n41) = J (81) K (81, 8g) ..» K(8n5 841) L(8n41)s (13) 


where J, K, L are defined by (11), (12). 


6. THE METHOD OF KAC AND A GENERALIZATION 


Kac (1946) and Kac & Erdos (1946) devised a very ingenious method for calculating multiple 
integrals, which he used to great effect in the theory of Wiener functionals. Because of the 
frequent occurrence of intractable multiple integrals in mathematical statistics, it may be 
of interest to generalize the Kac method by separating out its essence* from the particular 
device that Kac used in simplifying the calculations. 


Let A(2,, ...,%,,,) be the integrand of a multiple integral 


I -|--.[4e, o00y Bn4y) O(d2,) ... O(d2,,,1) 
which it is desired to calculate. If A can be factorized as a product 


A(2y, «+25 y41) = Ay(X) «++ Anga(€n41) 
the problem is simplified in a familiar fashion. Suppose, however, that A can be factorized 
as a product of functions of pairs as follows: 
A(X, ++) €n41) = Ko(X1,%p41) Th Kj(%4, X54). 
j= 
Under the further conditions 
K,(x,y) = J (x), K,(x,y) = K(x,y) = K(y,x) for j=1,...,n, 
Kac showed how the integral J could be calculated. The essence of the method is to expand 
Ky, Ky, ..., K, into bi-orthonormal series of functions {¢,;}, and then interchange integration 
and summation. The further simplification effected by Kac comes from choosing the 


functions {¢,} as the orthonormal eigenfunctions of the symmetric kernel K, and expanding 
J in orthonormal series. 


Let {;} be a complete set of orthonormal functions, so that 


l, k=e, 
[outer bate) o(da) = dpe = {o es 


and every square integrable function g possesses an orthonormal expansion 


5: ({a(e) dele) o(da)) dau) 
converging in mean square to g(y). 


* The author is indebted to the referee for pointing out the possibility of such a separation. 





Hen 


whe 


the 


lar 


ed 


nd 
ion 


she 
ing 





Roy B. LErenrk 203 
We formally write K,(x,y)~ © BP dulx) dy), 
where cl), -| K,(x,y) $,(x) d,(y) o(dx) o(dy) for j =0,1,...,n; u,v = 1,2,.... 


Let O® be the matrix [c,] of bi-orthonormal coefficients of K,, and let C,, = CY%C® ... 0™. 
If summation and integration can be interchanged, we have 


H,(%; Tn41) zi ous Xp). K,(%; Xn +1) a (dx) Te ige o(dz,) 


1 ‘ 
> Ge: VU << Un a ag Ga, Un—1 Pu,(X) Pon(Xn41) 


YU Un; Un 
= 2 Pu,( 2) PryA Ts) | 2 Cs , 
Uy, Un 


= 2 Pu,(%1) PryA ®n+41) G) Dees, n* 


Hence we find I -| K (245 Xn41) Ay (4, Xn41) 7(dx,) o(da, 1) 
as p> CPC )u, v8 iu Sj ea BOTH ndi,§ 
i,j U,v 


= tr(O'C,) = tr(CM'C®,.. “0m, 


The above formula constitutes our generalization of the Kac method. 
We now return to the Kac method proper. If K,(x,y) = K(x,y) = K(y,x)+0, then by 


Hilbert-Schmidt theory the symmetric integral operator 7'(f) (x) = [xe y) f(y) o(dy) 


possesses a non-empty set of eigenfunctions {¢,} and eigen values {A,}, for which 7'(¢;,) = Aj, d,- 
The function 


FF, (245 Xn+1) =[...[K( Ly)... K(X, X41) 7(dxq) ... o(dx,) 


is none other than the iterated kernel K™(x,,x,,,,), which for n>1 possesses the well- 


ren sagen K(%1, %y44) = x AEP AX) Py(Xn41)- 


If Ky(x, y) = J(x) L(y), and the eigenfunctions {¢;,} are complete, we write 
J(x)~ LHe Prl@), Ly) ~ = Ve PLY) 


where pe =| Fe) Gale) olde), v4 = [LAy) ly) old. 
Since I =| [en L(y 44) K(X, pq 44) (da) o(dx,, 1); 
we have finally [= = HARMS Si = DUAR E- 
k 


Kac considered the case L(y) = 1, v, = [dw o(dy). For our calculations, we need the 


slight increase in generality indicated above, as can be seen from the expression (13). 
If K, cannot be factorized, we use the bi-orthonormal expansion K,(x, y) ~ UCY) ¢,(x) ¢;(y). 
Then I = 5 CAR = tr (CA”), where A is the diagonal matrix of eigenvalues. This last 
k 


formula is of use in calculating the moment generating function in the circular case. 








204 Moment generating functions of quadratic forms 


7. APPLICATION OF THE KAC METHOD 
We now apply the above considerations to the calculations of ®. From (11) we have the 
symmetric kernel 
K(x, y) = exp (—¢,2? + day —cyy") (14) 


for which we wish to determine the corresponding eigenvalues and eigenvectors. Using the 
normalization of the Hermite functions found in (10), we have from the Mehler double 
generating function that 


2/42 4 42 . S 
oxy ns ee exp | “7 ul a3 [u?(a* + y?) — 2uay]}) = (1— ute 


jz 





h,(ra)hj(ry). (15) 


On comparing the left-hand side of (15) with (14), we find that a suitable choice of r 
and u gives a bi-orthogonal expansion for K(x, y). This choice is determined by the equation 


7222 2ur 
erga. Eas Rs 16 
7*i-a"* i-3"* (16) 
. : 2c, —1? 
The solution we seek is r= (4c2—d?)t, u= ae (17) 


This is satisfactory in case r+ 0, |u| <1, and w+ +1. From the known properties of the 
Hermite functions we find 


[7 Kee hglry) dy = mtu —w)h g(r), 
so that the eigenvalues are A, = 7rw(1 —u*)t (19) 
and the normalized eigenfunctions are 
W(x) = 1-42-4”(p!)-4 7h, (ra). (20) 


The Hermite functions are known to be complete in L,(— 00,00), so that {y,} is complete 
for r+ 0. 


We now calculate the coefficients 


y=] Tervla)ae, vy =|" Le) vpW)dy (2) 


in the expansions of J and L of (11). 

The labour involved is small, since J and .”, have the same simple exponential form. 
Multiplication of the generating function of the Hermite polynomials by exp (-—vz’), 
termwise integration, and coefficient matching yields 





re) 4-4 ! 1)-1(72/y—] dp, ; 
| ees" H (ra) dx = (" o4p! ((dp)!)> (r?/v— 1), p me ol 
— 0 , p odd, 
and therefore 
3 p 
eo ty} 2)—b 9-kv( yh 1)-1(7 1) 
| e-t2" yp (x)de = mt rh (q + 4r?)-4 2-4”( p!) ((4p)!) (55 » peven,| as, 


0 , podd. 





Fro 


where 


The 


The 


he 


19) 


20) 


23) 





Roy B. Lerenik 205 
From (11) and (21) we find 





. Nia ii p even, 
fete™ 0 , podd, } sis 
__ (1? — 2c,) (r? — 2cg) 
a * = GES Beq) (P+ Req) (25) 
B= ([(r? +e) (472 +5) ]}. 


ie) 
The series >) ,,v,, Aj, for our integral can now be summed. From (24) and (19) we have 
p=0 





oe n n —n n f- . (2m)! n m 
Zhe VyAR = mhntVpl—n(] — y2)hn B * = nly oh) 2m, (26) 
The binomial expansion of (1 — Z)-+ is 
_7a\-4 — 2 (2m)! am 
Comparing (26) and (27) we have from (8) and (9) that 
O(A,, 49, ...) = 2-hin+D(y,(t,)) 4 (v(h))-#"r1-"B-1(1 — u?"ax)-4, (28) 


where a and f are expressed in terms of ¢,, cs, and r by (25), w and r in terms of c, and d by 
(17), C1, Cy, Cs, @ in terms of Z,, 2, 23, W, a(h), v(h), V,(t,) by (12), and z,, 2, zs, w in terms of 
6, and e;; by (7). Note that (4,,0,,...) is an algebraic function of 0,,6,... and also of 
a(h), v(h), vy(t,), and the e;,;. 


8. THE UHLENBECK—ORNSTEIN PROCESS 


The only stationary Markov members of the set of serially stationary processes are those 


for which r(t,, to) = o%p!4-4!, a(h) = p*, 


v(h) = o%(1—p™),  %4(t,) = 0°. 


Gaussian processes of this type were introduced by the physicists Uhlenbeck & Ornstein 
(1930) as models for Brownian motion in a gas. Since the mean is taken as zero, the para- 
meters are p and o°. 

Many papers have been written on the estimation of the discrete time series U,, U,, ... 
generated by the first-order difference equation U;,,,—yU,, =, where U,,V,,¥,... are 
independently and normally distributed, and E[V%] = o? (k = 1,2,...). This process can 
be obtained by sampling the U —O process as follows: choose h > 0, t, arbitrarily, and set 
Ong = X(t, + kh) (kb = 0,1,...), y = 0", E[UZ.2] = 07, of = 07(1—y?). To study the effect 
of changing the sampling rate on any physical or economic phenomenon for which the 
above difference equation is a model, it is necessary to embed the discrete time series in the 
U—O process. Similarly, the effect of changing the sampling rate on a discrete circular 
process can be analysed by embedding it in a continuous ‘circular’ process discussed at the 
end of § 4. 

The estimators for y and o? have been quadratic forms in U,, U;, ... or quotients of such. 
The distributions of such statistics are extremely complicated. The introduction contains 
a summary of the methods employed to obtain exact and approximate distributions of 
such statistics. 








206 Moment generating functions of quadratic forms 


As an application of our methods, we will write down the joint moment generating func- 
tion of the statistics suitable for estimating p and o?, y and o?, or y and o%. 
The principal statistic proposed for a? has been, in the notation of § 5, 


1] n+l P,+P,+F, 
a 2 ae 1 2 3 
S; n+ 1 ae (t;) n+ 1 
We shall find that the statistics 
I, 
5 where I, = &eP,+P,) +P, 
particularly Sy = (1/n) (3(P, +P) +P) 


have advantages over S, as estimates of c?. Note that H[S,] = o? for each e, so that all the 
S, are unbiased. 

Many estimates for y = p” have been proposed. The ordinary correlation coefficient 
between X(t,), ..., X(t,,) and X(t,), ..., X(t,,,1), which can be written as Q[(P, + P,) (P,+ P,)}°, 
is very awkward. The serial correlation coefficient, defined as 


E X(t) X (ty) 
fon SE a 





aoe 
n+1 

x X*(t;) : 
j=1 


is the most frequently used. Durbin & Watson (1951) have suggested numerous others, 
and obtained their distributions in the form of products. Daniels (1956) suggested the 
‘intra-class’ coefficient Q/I,, which we independently arrived at. 

When X(t) is a U—O process, the density f,, ,,,, of the joint distribution F, |, of 
X(t,), ..-, X(t,4) takes the form (see (2), (3), (4)) 


y cces tn 
gs vost s (819 ballets 8n41) as (2770?) -hen +) (1 — p%)-in 


1 = = 


Thus, the three quadratic forms P, + P,, P,, Q are the only characteristics of the sample 
that enter the joint distribution, and form a set of sufficient statistics for the estimation of 
p and o?, as Koopmans (1942) pointed out. 

We find that the maximum likelihood estimates R and S of p” and o? satisfy the equations 


nS(1— R?) = See ad (30) 


(n+1)S(1— R®) = P,+P,+(1+R2) P,—2RQ. 


We find S = Q/R—P, on elimination, and substitution yields a cubic equation for R 
with coefficients depending on n, P,+P;, P,, Q. If S(1—R?) is negligible compared to 
nS(1 — R*), we obtain the approximation R ~ Q/P,. 

Another type of estimate, due to Von Neumann, has many desirable properties. Let 


N = ¥(X(tj1)— X(t)? = AQ). (31) 


7 


" 





wh 


CO) 


ar 


iC- 


nt 


the 


(31) 





Roy B. Lerpnrk 207 


Clearly E(N) = 2n0?(1—p"), so that for known o?, 1—N/(2n) is an unbiased estimator of 
p". Since N is a quadratic form instead of a quotient, the calculation burdens are less than 
with Q/Z. Furthermore, N/T can be used to estimate p itself in the limit as h—> 0, for 


E{(N/T] = 207(1—p")/h, lim ELN/T)] = — 207 log p. 
h—>0 


T 
Savage proved the strong results that p.lim NV/7' = —2o*logp, and that if | dX (t)? 
h->0 0 


eT 
is defined as the limit in measure of NV as h>0, then | dX (t)?/T = — 20 log p with prob- 
0 


ability 1. Rubin (1947) extended this by sampling with a variable rate, weakening the 
Gaussian character, and generalizing the covariance function of the process. We will 
find that I,, N, Q have the simplest moment generating functions among the quadratic 
statistics for the U —O process. 

Moreover, the intra-class correlation Ry = Q/I, = 1—4N/J, is an estimate of p" for 
unknown o? with a simpler distribution than Q/J, or Q/Jh. 

We now specialize the fundamental formula (28) to obtain the joint moment generating 
functions of J, = e(P,+P,)+ P, and Q. 


We have (4,05) = (20%)-H+0 (1 — p2h)-4n ri—nB-1(1 — u2)8" (1 — xu2”)-4, (32) 
a al 1 _O,, 1+p* 
where C= C3 = (€— 3), +75) Cg 2 t ao] — phy’ 











‘ ; : 1! . (33) 
or tie Sos Bite 28 
r= |(1-%+ 5520 Fn) (4+%+ soaceam)| 

- 2c.—1r? r? — 2c,\? eae 

sine sa = (G52) , B= hr +e,. 


The advantage of taking e = } is apparent. Note that ©(0,,0,) can be extended into the 
complex domain by analytic continuation. 


The moment generating functions ‘V’,, ‘f’,, ‘’, of J,, N, Q are easily expressed in terms of 
® by 
‘F\(@) = O(9, 0), 
¥2() = O(20, — 26) |,_y, (34) 
¥',(0) = ®(0, 0). 
The means of J,, N, Q are known already 


E{I,} = (n + 2¢— 1) 07, E{S,] = oe, 


E(W] = 2no%(1—p"), (35) 
E[Q] = no*p". 
The variance of J, is given by 
d? ; 
var [I,] = de log ¥'(9) |o-o (36) 


and the variances of N and Q are given by similar formulas. 








208 Moment generating functions of quadratic forms 


After a very long and obnoxious calculation we find 


var [J,] = . pamal? n(1 — pa”) — (1 — p2*) (1 +p”)? 


+ 4ey(1—p®") (1 — p™) + 4e¥(1 +p?) (1—p)%], (37) 


(l-, 


where ¢, = e—}4. Since S, = I,/(n + 2€,), we have var[S,] = var [J,]/(n + 2e,)?. 
In particular we have for ¢ = 0, 4, 1 the variances 


























var [S,] = im Ee H(l So ae =a] 
var[Sy] = G cm Ee —-*)- ——_ (1+ oy : ad 
var[8;] = G wp ae p") eae aa —p")+ asi 
Similar calculations for N and Q give 
var [N] = a aie [n(3 +p? + 4p") + (p2"h— 1)], (39) 
vez |S] = on[ tM , o). a 


The distribution of Q/J, can be calculated from the joint moment generating functions 
(0,,6,) of I, and Q by the inversion formula of Gurland (1948) by which 


H(a+)+H(a-) = 1-— lim (| +[)* waa 1 (41) 


sae 
where H(a) = Pr[Q/J, <a]. 
An explicit expression can be better obtained by other methods (see Daniels, 1956). 

The variance results (37), (39), (40) yield a number of interesting consequences, summarized 
in the appended table. Seven limiting situations are considered for each of the five statistics 
N/T, S, = I,/n, Q/n, Q/I,, N/(7'S,). In the upper half of the double row devoted to each of 
the first three statistics is found the limiting variance, and in the lower half the constant to 
which the statistic tends in probability, if such exists. If the limiting variance is positive 
and finite, we may plausibly suppose that a limiting distribution exists. For each of the 
last two statistics the single row contains such conclusions as can be inferred from the 
behaviour of the first three. 

If o? is known, N/T is a much better estimate of p (or rather, of — 2c? log p) than Q/n is 
of op" or Q/I, of p”, since it is consistent under a much wider variety of conditions. If o* 
is unknown, the estimates V/(7'S,) = (n+ 2e—1) N/(nhI,) appear to be better estimates of 
—2logp than Q/J,is ofp". Of these, N/(7'S,) = N/(h,) would appear to have the simplest 
distribution. Von Neumann (1941) himself chose to investigate the distribution of N/J, for 
the discrete non-circular time series with p = 0, for testing the hypothesis of uncorrelation. 

From the fundamental formula (28), we can write the joint moment generating functions 
®, of J, = ¢e(P,+ P,) +P, and N = P+ Py +2P,—2Q as 


(0, 02) = (207)-Hm+ (1 — p*)- rn B-1(1 — 2) (1 — cu?n)-4, (41) 


( Limi 


wh 
anc 








7) 


and 


Roy B. LErenrk 


1 
qy= (¢-3)0,+75 = C3, 


0 1 2h 
Co = 7 +0,+ ba 
i fas welll 
d = 20.4 aa amy" 


io = *)'| 





209 


where « and f are expressed in terms of c,, c, and r by (25), w, and r in terms of c,, d by (17), 


(42) 


The greater simplicity obtained by choosing ¢ = } is again apparent. Applying the Gur- 
land inversion formula as in (41) to ®,(4,,6,) yields the distribution of the quotient N/T, 











































for arbitrary e, p. 
Table 1 
Limit | | | 
condition |p fixed ghee — rag p fixed p->0+ p>1- 
h fixed fe age z g h->0 n fixed n fixed | 
Ba h fixed | 
n—->co To T fixed T+0 n fixed h fixed | i fixe 
Statistic 
4 
Variance 0 0 0 0 0 = (8n—1) 0 
IP | | ™ 
p. lim. | — 207 log p | —20* log p — 20 log p —2o0* log p| —20* log p lim. dist. 0 
—p2? a 2 
{ Variance 0 0 ( - . - a al 20% 0 ot(2n—1+4e,+ 4e;) co 
5, | Tlogp (T log p)? (n+ 2e,)? | 
p. lim. o o lim. dist. lim. dist. oa lim. dist. ? 
—p? o 
{ Variance 0 0 o*( _ : - —- ) foe) foe) — 00 
WIN Tlogp (T log p)* n 
\p. lim. o%ph o? lim, dist. ? 2 lim, dist. ? 
iI, p. lim. ph 1 lim. dist. ? ? lim, dist. ? 
N/TS, p. lim. —2logp —2 log p lim. dist. lim. dist. —2logp lim, dist. 





14 





The question of which S,, for ¢ in [0, 1], gives the best estimate of o? is also of interest. 
On grounds of theoretical simplicity, ¢ = } is indicated. Numerical computation of var [S,]. 
by means of (37) et seq. has been carried out for ¢ = 0, 0-1, 0-2, ...,0-9,1, n = 3,4, 5, ..., 10, 
15, 20,30,...,100, p = 0,0-01,..., 0-05, 0°10, ..., 0-90, 0-95, ...,0°99, A = 0-001, 0-01, 0-1, 1. 
The general conclusion is that y, = var[S,]/o* tends to its limit 2 very rapidly, nearly 
independent of e, p, h. More precisely, for 0 <p < 0-95, alle, allh, n > 3, we have | y,—2| < 0-04 
and for n>10, | y,—2|<0-01. However, for 0-:95<p<0-99, y, has a flat minimum for 
0-1<ée<0-9, rising about 40% for 0<e<0-1 and 0-9<e<1, all h, and 3<n<10. In the 
same p, h range, y, is practically constant for n > 20. Thus the choice of ¢ cannot be made on 
numerical grounds either when 0 <p < 0-95, or when n > 20, in the range of h considered. 
For 0:95 <p < 0-99 and n< 20, y, increases considerably for ¢ near 0 or 1, which seems to 
support the contention that e = 4 (intra-class statistics) are best for non-circular correlation. 


Biom. 45 


Thanks are expressed to Mr Rod Smart for checking (37) and to the staff of the 701 Com- 
puter, U.S. Naval Ordnance Test Station, China Lake, California, for computational 
assistance. 








210 Moment generating functions of quadratic forms 


REFERENCES 


ANDERSON, R. L. (1942). Distribution of the serial correlation coefficient. Ann. Math. Statist. 13, 1-13, 

ANDERSON, T. W. (1948). On the theory of testing serial correlation. Skand. Aktuar. Tidskr. 31, 88-116, 

DantEts, H. E. (1956). The approximate distribution of serial correlation coefficients. Biometrika, 
43, 169-85. 

Doos, J. L. (1944). The elementary Gaussian process. Ann. Math. Statist. 15, 83-106. 

Dursin, J. & Watson, G. 8. (1951). Exact tests of serial correlation using non-circular statistics, 
Ann. Math. Siatist. 22, 446-51. 

GRENANDER, U. & RosENBLATT, M. (1952). Spectral analysis of stationary time series. Proc. Nat. 
Acad. Sci., Wash., 38, 519-21. 

GRENANDER, U. & RosENBLATT, M. (1953). Statistical spectral analysis of time series arising from 
stationary stochastic processes. Ann. Math. Statist. 24, 537-58. 

GRENANDER, U. & RosENBLATT, M. (1954). An extension of a theorem of G. Szego and its application 
to the study of stochastic processes. Trans. Amer. Math. Soc. 76, 112-26. 

GuURLAND, J. (1948). Inversion formulae for the distribution of ratios. Ann. Math. Statist. 19, 228-37, 

Jenxrns, G. M. (1956). Tests of hypotheses in the linear autoregressive model. II. Biometrika, 43, 
186-99. 

Kac, M. (1946). On the average of a certain Wiener functional and a related limit theorem in the 
theory of probability. Trans. Amer. Math. Soc. 59, 401—14. 

Kac, M. & Erpés, P. (1946). On certain limit theorems in the theory of probability. Bull. Amer. 
Math. Soc. 52, 292-302. 

Koopmans, T. (1942). Serial correlation and quadratic forms in normal variables. Ann. Math. Statist. 
13, 14-33. 

Lerpnik, R. B. (1947). Distribution of the serial correlation coefficient in a circularly correlated 
universe. Ann. Math. Statist. 18, 80—7. 

NEuMANN, J. V. (1941). Distribution of the ratio of the mean square successive difference to the 
variance. Ann. Math. Statist. 12, 367—95. 

Ocawara, M. (1951). A note on the test of the serial correlation coefficient. Ann. Math. Statist. 
32, 115-18. 

QUENOUILLE, M. H. (1949). The joint distribution of serial correlation coefficients. Ann. Math. 
Statist. 20, 561-71. 

Rustin, H. (1947). Some results on the distribution of quadratic forms from Gaussian stochastic 
processes. Abstract no. 11, Ann. Math. Statist. 18, 610. 


UHLENBECK, G. E. & OrnsteErn, L. 8. (1930). On the theory of the Brownian motion. Phys. Rev. 
36, 823-41. 





i -_— - oa wi i al 





[ 211 ] 


MOMENTS OF SAMPLE MOMENTS OF CENSORED SAMPLES 
FROM A NORMAL POPULATION 


By J. G. SAW 
University College London 


1. IntTRODUCTION 


Given asample ¥,, Y>, ..., Y¥, of size n from a normal population, it is sometimes found that 
only the smallest r (or largest r) of these are available for statistical tests concerning the 
parameters of the population. In the case where the parent population is normal, mean s, 
variance o? for example, and we are given only the r smallest observations, we may be 
required to differentiate between a ‘null’ and an ‘alternative’ hypothesis concerning one 
of the parameters, the other being either numerically specified or unspecified. 

Should the whole sample of n be available, the appropriate maximum likelihood test 
function is well known in each case and is seen to be a simple function of the first two 
k-statistics of the sample. 

Although the test function appropriate to the statistical tests described above are con- 
siderably more complex when a censored sample only is available, it is found that these 
too may be written as a function of the first two k-statistics together with the rth smallest 
ordered variable. In order to find moments of these test functions we shall first require 
moments of the k-statistics and the work which follows gives a method of finding the latter 
as an expansion in powers of n-1. 

The appropriate test functions for the four main single-sammple tests which may arise 
when dealing with a normal population are being studied with regard to their distribution 
and power. For example, the simplest case which may occur is where we require to test for 
the location of a normal population, the standard ceviation being known. In this instance, 
the distribution of a certain linear sum of censored mean and rth ordered variable is very 
close to normality, while the asymptotic relative efficiency is 81-83°% when 50% of the 
sample is available and 95-63° when 80% of the sample is available. It is intended to 
give a full account of the test functions for location, standard deviation being either known 
or unknown and for dispersion when the mean is either known or unknown, in a later paper. 

David & Johnson (1954) give formulae which approximate to the moments and produci- 
moments of ordered variates, but state that these may not be suitable for extreme values 
(which are of course contained in the k-statistics of a singly censored sample). We shall 
show in the first part of this paper, that our problem may be reduced to a simple extension 
of theirs. 

§4 deals algebraically with the approximation to a set of integrals of a general type by 
a power series in (n + 2)-', the integrals being fundamental to the solution of the problem. 
Precise algebraic expressions are given which occur under normal theory. In §5 are tables 
of the coefficients of (n + 2)°, (n+ 2)-1, ..., (m+ 2)-* occurring in this power series for certain 
of the set of integrals and various values of the ratio r/(n+ 1). In §6 some comparisons are 
made between estimates provided by our series, taken to the term in (n+ 2)-5, and some 
exact values given by Teichroew (1956) for n = 19; k = 10,11,..., 16. 


14-2 








212 Moments of sample moments of censored samples 


Finally, it should be mentioned that the methods described in this paper may be used 
for distributions other than normal and for doubly censored samples. 


2. NoratTion 


We take y,, Yo, .--, Yn, a8 a random sample of n from a unit normal population, mean zero. 
x, will be used to denote the rth smallest of the y’s; 7}, 3, ...,#}_,, will be used to denote the 
r—1y’s smaller than z,, and are subject only to the condition 


u%éx, (¢=1,2,...,.7—1). 
being randomly ordered amongst themselves. 
The k-statistics, k, and k,, are defined in the usual way, viz. 
ase 4, ks = ot |r(#+E a8)-(2 rags x)']. 
i=1 7 r(r—1) ‘=1 «ike 
We shall write 
Z(x) = Tame? (—42%), F(a) = | “Aw du, p,=r/(n+1). 


For reasons, which will become clear later, we define a function 


Z(x) 

aan at BEE. r—1 

Venn: 4,0) = | Ze Ma)[1-Fer| Fa] ede, 

which will usually be abbreviated to y,, when the values of r and n are clear from the 


context. 
We notice that 


p(x, | x,) = Z(x})/F(x,), —-o<aea, (¢=1,2,...,r—1), 


Plt) = pe Pr Ma) F(a) (-20<4,< +0). 


3. RELATIONSHIP BETWEEN MOMENTS AND THE INTEGRALS W(p,,7: a,b) 
Moments about zero of the k-statistics may be obtained as a finite linear sum of integrals 
of the type ~(p,,”: a,b) which we shall demonstrate by application to the first moment 
of k,. Writing k, in the form 














1 (7-1 ? 22, r—1 : . 
b= 2{E2 ui+at— EE wae} — “Se, (3:1) 
since p(x; | x,) = Z(«;)/F(x,) and xj, «; are independent, then 
1 Z (x. 3 Z 
E(k,| x,) = {t- 1) ! - oF 3] +a2—[r—2] Ee 1 + 20, Rel (3-2) 
But it will be seen that é Falk = Vo (3-3) 


whence E(k) = “{(r —1) Woo— (7 +3) Yr + Wor — (7 — 2) Woo}. (3-4) 





vie 


sed 


TO. 
the 


he 


ls 
it 





J. G. Saw 213 


The value of yo is of course unity but the symbol has been left in the formula (3-4) in 
view of equation (3-8) which gives a relation between the product-moment &(k2%) and the 
integrals y,,. 

Proceeding according to this example, we obtain the following relations for the first four 
moments of k, and the first two of k,: 


E(ky)r = —(r—-1) Pot Vor : 
& (kj) 1? = (r— 1)? Yrog + (7 — 1) (Yoo — 3¥'11 — Yoo) + Vor 
& (Ki) 1? = — (r —1)8 Wag + (7 — 1)? (6Yra1 — 349 + BY 30) 
+ (r—1) (Yio + 8¥o1 — Ti — 821 — 2Ws0 + Wes), + (35) 
E (kh) r* = (r—1)* Yryg + (7 — 1)? (6Ya9 — 105, — 6 Yao) 
+ (17 —1)? (Byqq + 25Yrog— 18py, + 30Y%g; + L149 — 10Y29) 
+ (r—1) (4Yro9 + Yon + TY 11 — 8Yray — 20g, — 2529 — 15yy5) + Vor 
E(ky) 7 = (r—1) Wo — (7 +8) Yar + Von — (7 — 2) Pro: 
&(K3) r2(r —1) = (7 — 1) Yrog + (Sr? — 107 + 1) Wy + (7 — 2) (7? — 127 + 25) Yoo 
+ (r— 2) (r—3) (2r—10) Wg, + (r—4) (7-2) (7-3) Wap (3-6) 
— (2r3 — Or? + 20r — 25) ry, — (7 — 2) (27? — 47 + 10) Woo 
+ (2r?2 — 4r + 6) Woot (73 — 17 +17 —3) Woo- ) 








The following formulae which are of interest may also be noted: 


14 (2,) = Yor 
Hol%,) = Yoo— Vor 


Ha(%,) = Yos— 3¥o2 Vor + 2, = 
Ha(%,) = Voa— 4¥0s Yor + S021 — 3YO1- 
The product-moment of any k-statistic with x, may be written in the form 
E (ki, x) = V*E(ki), (3-8) 
where V* is the operator such that V*yhy, = Wa,w+s (3-9) 


so that, for example 


E(kgx,) 7 = (r—1) Woy — (7 +3) Pre + Vos — (7 — 2) Yor: 


4. A METHOD OF EVALUATING THE INTEGRAL W/(p,,”: a,b) 


It has been demonstrated in § 3 how evaluation of moments of k-statistics may be reduced 


to the evaluation of a set of integrals of the same form, namely ¥(p,, : @,6). In a few 
isolated cases the integrals are known; for example, we may deduce from symmetry that 


(4, 2m+1:0,2b+1)=0 (m=0,1,...,00; 6 = 0,1,...,00), 


that is all odd moments of the median are zero. 
Methods of quadrature may be applied to the integral, although it is to be expected from 
the form of the integral and range of integration that a large number of ordinates would be 








214 Moments of sample moments of censored samples 


necessary to produce reasonable accuracy. Be that as it may, a more serious disadvantage 
of applying quadrature is that each of r, n, a and 6 must have their numerical values 
specified explicitly. 

It will be shown that there is a method of closely approximating to the true value of the 
integral when only a, 6 and the ratio r/(n+ 1) are numerically specified, by equating it to 
a truncated power series in (n + 2)~!. This approximation to the integral follows a method 
used by David & Johnson (1954). 

We see that 


Z(x)\4 
,n: a,b) = &p\|——] 2 ‘ 4+] 
UP, adi, r|( F ) Beas of F ( ) 
1 mas n—-r ° 


Now F is densest around F = r/(n+ 1) = p, say, and we define X, by F(X,) = p, and put 
OF = F—&(F) = F—p,. (4:3) 


Using an inverse Taylor expansion we may, by expanding about X,, write 


rut ome eet ys eo (4-4) 
Z(x)/F (x) = d,+d,6F +d,6F?+...+d,6F'+..., 
Zax) \@ ce) 

whence lta x =x ze d*), d6F* (4:5) 

co a/o b 
(c'd*), being the coefficient of dF¢ in (5 C; aP*) (= d; oP) so that by (2-1) and (2-4) 

0 0 

W Pym: a0) = 3 (cr), M (46) 


when this converges and where M, = &(dF)' is the tth central moment of the B-distribution 
of equation (4-2). 

If k, n, a, b, are each numerically specified, we have here an alternative method to that of 
quadrature for evaluating y(p,,: a, b), for by taking the first (say) eleven terms of the sum 
(4-6), since M, is of order n-4), then > (c?d*), M, is of order n-*; thus we may approximate 

t=11 
as closely as desired to y(p,,n: a,b) by taking a sufficient number of terms of the sum, at 
least for n sufficiently large. 

However, if only p,, a, b are specified numerically, we see that since c; and d; are simple 
functions of p,, X, and Z, = Z(X,) (which are given below, Table 1) and since 


i=0 
where m,, is a function of p, only (given in Table 2), we have 
tl @ 
Y(P,, 0: a, 6) == X(ed*), my(n + 2)~*+ O(n +2), (4°8) 
1 
= 2 AA Py, 6,0) (9+ 2)% + O(n + 2), (49) 





whe! 


sinc 
1/(Z 


wel 


an 


in 


th 





J. G. Saw 215 


here bed 
F H{P,,4,0) = ¥ (d*) m4 


2% 


since m,; = 0 for t>2i. Now the {c,} are seen to be just polynomiais in X, multiplied by 
1/(Zij!); thus if we put 

c,Z3j! = EXith ij (4-11) 
we may table the integers h,;. 


Table 1. Values of h,; for 0<j <10 
































| 

0 0 l 
| 1 1 

2 0 1 
mt = of 0 2 | 
| 4 0 7 0 6 | 
| 5 7 0 46 0 24 | 
6 0 127 0 326 0 120 | 
7| 127 0| 1,740 0| 2,556 0 720 | 
/8| o| 4,369 0| 22,404 0| 22,212 0 5,040 | 
9 | 4,369 0 | 102,164 0 | 290,292 0 | 212,976 0 | 40,320 | 
110! 0 | 243,649 0 | 2,080,644 0 | 3,890,484 0 | 2,239,344 0 | 362,880 
fe 











We note the relation = Aj; = (f — 1) hyn ja + (041) Aig ja 


the proof is easily effected by induction. The d’s are more readily expressible in terms of 


the c’s, thus: 
dy = Z.9,", 


d, ae —Z,p, *— Cop, *, 
2Z,p, * + 2Uop,*—Cyp,* 
3!d, = —6Z,p,4—6cyp,;?+3c,p;2—cop, 1, ete. 


= 
Q 
re 

I 


and a useful recurrence relation for the d’s, which may be proved by induction, is 


1 
dan = 2 [dy 52]. 
ie Fe 
In passing, it is worth noting that if r and n are specified numerically so that we are 
interested in the summation (4-6) where M, must be known, as opposed to the case where 
we know numerically only the ratio r/(n+1) and so use the summation (4-8) involving 
the m,;, then the recurrence relation 


1 
n— Ss ew eae r+ Mia 


(n+t+1)M,,, =t a (n+1p 








216 Moments of sample moments of censored samples 


provides the easiest way of calculating the M,. The following table gives values of m,, in 
reese iti U, = I — DP, = (n— 2r+ 1)/(n+ 1), % = 2,9, = r(n—7r+ 1)/(n+ 1)*. 


In addition we note that My; = 90, t>0; Mo = 1; 
m= 90 for all 7; 
My, =90, t>1; Mz, = ¥,; 


My => 0, t> 2%. 


Table 2. Values of m, in terms of p, 








ide 
2 3 4 5 

t 
3 2u,v, —2u,v, 2u,v, — 2u,v, 
4 3v? 6u2v, — 6v? 12v? — 18u?v, 42u2v,—24v?2 
5 Zero 20u,v? 24u8v,—92u,v? 332u,v2— 144u3v, 
6 Zero 15v3 130u2v2 — 90v3 120u4v, — 1070u2 v2 + 4200; 
7 Zero Zero 210u,v3 924u3 v2 — 2142u,v8 
8 Zero Zexo 105v4 2380u? v3 — 1260v4 
9 Zero Zero Zero 2520u, v4 
10 Zero Zero Zero 945v8 























5. NUMERICAL VALUES OF THE COEFFICIENTS H,(p,,a,b) 


Numerical values of the coefficients of (n + 2)-*, that is the H;(p,, a,b) occurring in equation 
(4-9), which arise in the expansion of (p,,n: a,b), have been obtained using the methods 
described, for values of a+b<4, p = 0-50(0-05) 0-80 and 7 = 0(1)5; these are given in 
Table 4 printed on pp. 218-221. It is thought that these figures are at most two out in 
the last place of decimals given. 


6. ACCURACY OF THE TABLES 


In an article edited by Teichroew (1956), the values of &(a;: N) and &(,x;: N) have been 
evaluated by quadrature to ten places of decimals.* Since 


E(x,) = Yor E(x?) _ Yous & (Z,_1) gles Yio» 6 (%,_12,) om tt Vir 


r—1 
where %,_; = ©) %;/(r—1), we may compare values of ¥p,, Woo, Wi, Wi, Obtained 
i=l 


(i) using Teichroew’s Tables, 
(ii) using the first six terms on (4-9), i.e. with 1 = 5. 
The results are shown in Table 3 for n = 19, r = 10,11,..., 16. The error, i.e. difference of 
(ii)—(i), for these particular values of a and 5 is seen to be, at worst, five in the seventh signi- 
ficant figure, although the results often agree to eight significant figures. 


* In his article, Teichroew uses x; as the ith largest observation in a sample of N from a unit normal 
population. 





on 


of 


al 





J. G. Saw 217 


Table 3. Comparison of approximate values of yy», with true values for n = 19 





















































You Yoo 
r By (ii) | Error r By (ii) Error 
10 Zero Zero 10 + 0-0807 9094 + 4 
11 + 0-1307 2488 + 0 1l + -0983 7659 + 4 
12 + +2637 4289 + 0 12 + -15239428 + 3 
13 + +4016 4227 | + 0 13 + +2469 3379 + 2 
14 + -54770736 | +1 14 + -39000519 + 4 
15 + -7066 1142 | + 6 15 + -59609419 + 7 
16 + -8858 6168 | +28 16 + +8922 2497 +53 
Vio Vu 
! | | 
r By (ii) Error | r By (ii) | Error 
a I | | 
| 10 + 0-8066 8488 | +0 10 — 0-0510 6715 +2 
ll + -7260 1640 | +0 ll + -0459 6047 —2 
12 + -6481 3083 +0 12 + -12374801 -—2 
13 + +57214135 +0 13 + -18406949 | -3 
14 + +4972 3492 | +0 14 + +2278 3843 | +1 
15 + 42259618 +0 15 + -25513535 | -3 
| 16 + -3473 1563 | +4 16 + +2650 5331 +4 
| | 
N.B. The ‘error’ is: (result from (i) — result from (ii)) x 108. 
7. SUMMARY :ND COMMENT 
In §3, we derived explicit relations bets. -sxoments of sample moments of a censored 
normal sample and certain integrals of t's: ¢ © Wap. 


§ 4 divides itself into two parts correspuiid uz to two cases which may arise. 

Case (i). Arises when we are given r and » numerically and we wish to determine the 
moments of sample moments. In this event we approximate to the integral by taking a 
sufficient number of terms of an infinite sum {equation (4-6)), noting that if we take the first 
21+ 1 terms of the sum the error is of order (n+2)~—. It is then only necessary to 
substitute the numerical values of 7f,,, into the appropriate equations (3-5)—(3-8) to obtain 
numerical values of the moments for the given r and n. 

Case (ii). It would clearly be laborious to prepare tables of y,, for all combinations of 
rand n, even though we may restrict r and n to be less than 20, and this restriction would 
severely limit the application of the tables. It has therefore been assumed that only values 
of r/(n+ 1), a, and 6 are given numerically and an expansion for 7, has been obtained as 
a power series in (n + 2)-1, the coefficients of which are functions of p,, a, b only (equation 
(4:9)). We are then in a position to evaluate y,, for any values of 7 and n satisfying 
r/(n+1) = p,. Moreover, by using this treatment we are better able to study the moments 


of the sample moments and related problems suggested in the introduction, as n becomes 
large. 








218 Moments of sample moments of censored samples 


In § 5, as a consequence of the remarks made above in case (ii), values of H;(p,, a,b) have 
been obtained for p = 0-50(0-95) 0-80 and a+b < 4. These were given in Table 4 and 
provide a means of obtaining the value of y,, for a large range of values of r and n. 

It will often be required to interpolate when p, = «/(n + 1) is not one of the values given. 
In these cases, it will be found more convenient to work out the numerical values of 
W(p,,n: a,b) for the values of p, to either side of p, and then to interpolate in these values, 
rather than to interpolate in Table 4 for H;(p,,a,b) and then to obtain y(p,, : a, b). 

In §4, certain values of y(p,, 19: a,b) are compared with some known results for a = 0; 
b=1,2:a=1;b = 0,1. It will be seen that for 7 as small as 19, good agreement is obtained. 
This is because for these values of a and b, H;(p,,a,b) remains of order one. When a or 
takes higher values of 3 or 4, where some H,(p,,a,b) are of order 100, it is thought that 
n = 50 will give similar accuracy. 


I wish to express my gratitude to Dr F. N. David and Dr D. E. Barton for their suggestions 
and constructive criticisms which have proved invaluable in preparing this paper. 


REFERENCES 


Davin, F. N. & Jounson, N. L. (1954). Statistical treatment of censored data. Biometrika, 41, 228. 
TEICHROEW, D. (1956). Tables of expected values of ordered statistics for samples of size 20 and less 
from the normal distribution. Ann. Math. Statist. 27, 410. 


Table 4. Values of H; (p,, a, 6) 



































a=1;b=0 
Pr | i=0 1 2 3 4 5 
SS are Sere 
0-50 | +0-79788 45608 | + 0-17122 7492 | + 026759482 | + 0°3518053 | + 0-350832 | + 0-20642 | 
55 | + +71964 52336 | + = -12315 2558 | + -+20870 247 | + *29503 63 | + 311988 | + -18241 | 
60 | + -64390 42225 | + -08049 2217 | + -+16088 721 | + 2533577 | + 277806 | + +12888 | 
| | 
0-65 | +0°56984 46222 | + 0-041856190 | + 012255155 | + 0-2260522 | + 0-245949 | + 0-03288 | 
‘70 | + -49670 37346 | + -00620 0557 | + 09337578 | + 2143210 | + 208964 | — -14341 | 
‘75 | + -42370 20969 | — -02729 4186 | + -07461 594 | + *22205 48 | + 145752 | — -49572 | 
“80 | + °34995 24005 | — :05929 8026 | + -07032069 | + -2579816| — -00997 8 | — ed 
a=0; b=1 
Z | | 
Pr | i=0 1 2 3 4 5 
| 
| | 
‘ a 7 as ee ee ee ee 7. oe | a ee —_ 
0-50 Zero | Zero Zero Zero Zero Zero 
55 | + 0°12566 13469) + 0-09926 2368 | + 0-14089 642 | + 0-1552289 | + 0-097953 | — 0-02220 
‘60 | + -25334 71031 | + -203681761 | + -29158668 | + -3197356| + -186023 | — -08377 
0-65 | +0-38532 04663 | + 0-31947 2780 | + 0-46443 172 | + 0-5045963 | + 0244160 | — 0-24771 
‘70 | + -5244005127 | + -455471782 | + -67845307| + -7250833 | + -220097| — -64694 
‘75 | + -67448 97502 | + -626185313 | + -96821269/| + 1-0043049| — -042989 | — 1-60931 
“80 | + -84162 12335 | + -85903 0815 | + 1-40772958 | + 1-3773535 | — 1-098516 | — 4-05425 





























l 
(pn: a,b) = 2 HP a,b) (n+ 2)-*+O(n+2)--1 































































































J. G. Saw 219 
] 
wi Table 4. (cont.) 
and 
a=2;b=0 
es: lp, | ime 1 2 3 4 aoe 
3 of | sen a ee a | | 
ues, | | 
(0:50 | +0-63661 97724 + 0-909859317 + 1:33469475 + 1-7791261 | + 2-069240| + 1-97734 
| -55 | + -51788 92622 + -76187 8569 + 1-06994056 + 1-3903600 + 1-643594 | + 1-68973 | 
= 0; | 60 | + -41461 26478 | + -640365184 | + 085548320 + 1-0804216 + 1-309290 | + 1-44291 
ied. | 0-65 | +0-32472 28935 | + 053896 3236 + 0-67652198 + 0-8269100 | + 1-052264 | + 1-24495 | 
or b 10} + -24671 46000 | + -45301 1344 + -52330977 | + -61503 80 | + 0-869321 | + 1-09066 | 
hat | -75 | + -1795234669 | + 378879236 | + -38403 267 + 4884837 | + -775759 | + 0-92477 
| 80 | + -12246 66826 | + -31345 8958 + -25360495 | + -29626.20 | + 815012 | + -55508 
| | | 
a=1;6=1 
ons Re a 
| 
| Py i=0 | 1 2 3 4 5 
EE - _—_—_——— —_——— _—--—— —_—— —- ———_ on _ — 
0:50 Zero — 1-00000 0000 | — 1-42920 367 | — 1-82021 61 1-99786 2 | — 1-86363 
228, 55 | +0-09043 15893 | — 0-87413 9992 | — 1-18344855 | — 1-42280 77 | 1-52243 4 | — 1-56585 
less | 60} + -16313 12604 | — -777425743  — 0-98742 960 — 1-09126 82 | — 1-11786.0 | — 1-35488 
| 
0:65 | +0-21957 27955 | — 0-70438 3338 | — 0-82519724 | — 0:7945753  — 0-763704 | — 1-31062 
70 | + -26047 16931 | — -651554388 | — -68346466 | — -5031620| — -452869 | — 1-61535 
15 | + -28578 27215 | — -617059012 | — -54803008 | — -1768624 | — -222519 | — 2-72755 
80 | + -20452 73710 | — -60052 6374 | — -8072.511 | + -2501997 | —  -28533.0 | — 614333 
| | ' ! | 
a=0; d= 
an ae Stes batt we RE ee 2) a ee eee 
| | Pr i=0 | 1 2 3 | 4 | 5 | 
42 | | 
41 | — ae : : wars 
| 0-50 | Zero | + 1-57079 6327 | + 2-46740110 | + 3-4627311 | + 4-130238 | + 3-91102 
| “56 | +0-01579 07741 | + 1-60478 6203 | + 253037 148 + 3-57861 74) + 4246716 | + 3-91800 
od | 60 | + -06418 47546 | + 1-71113 0875 | + 276775770 | + $¥485106 | + 4-608022 | + 3-88450 
72 | 0-65 | +0-14847 18617 | + 1-90441 7391 + 319527863 + 4-6492039 | + 5242518 | + 3-58194 
“ 10 | + -27499 58977 | + 2-21481 3445 | + 3-91515 342 + 5-8511050 | + 6170748 | + 229136 
75 | + -45493 64231 | + 2-70147 8618 | + 5-12583 162 | + 7-9260148 | + 7-281723 | — 2-29282 
| + 70832 63007 | + 3-48732 8685 | + 7-28989853 | + 11-7694044 | + 7-563796 | —19-32806 
| eS yee Y | Rae 
as a=3;b=0 
| | | 
a Pr i=0 | 1 2 3 | 4 5 
| 
0 a ae | | : | ‘a 
0:50 | +0-50794 90875 | + 1-85086 768 | + 36647898 | + 6360611 | + 9-40919 | +11-4188 
‘55 | + -37269 65391 | + 1-45350868 + 28133674 | + 4813982 | + 7-08631 | + 8-7461 
1 60 | + -26697 08346 | + 1-13688 226 | + 21622770 | + 3661605 | + 535159 | + 6-6626 
4 | 
1 0:65 | +0-18504 15946 | + 0-88060092 | + 1-65535 15 | + 2783574 | + 402009 | + 5-0269 
5 ‘70 | + -12254 40632 | + -67044 798 | + 1-25708 01 | + 2104912 | + 295750 | 4 3-7747 
715| + -07606 44694 | + -49629562 | + 0-9378512 | + 1561264 | + 210444 | + 2-8067 
— ‘80 | + -04285 75096 | + -35087324 | + 0-6899212 | + 1-120422 | + 1-36757 | + 2-3292 
L | | 
qt 
V(p,,n: a,b) = LD H,(p,,a,b) (n+ 2)-*+O(n +2) 
i=0 





Sa PH ot tee PPT Oe A fo) aes a 








220 


Moments of sample moments from a normal population 


Table 4. (cont.) 


a=2; b=1 





re 



























































Pr +=0 1 2 3 4 
| 0-50 Zero — 1-59576 912 — 3-30804 40 — 5-97305 2 — 911597 — 11-2436 
-55 | +0-06507 86622 | — 1-23608 335 — 2-51107 26 — 4-50162 9 — 686359 — 8-5452 
-60 | + -10504 09132 | — 0-94965 140 — 1-90464 95 — 3419357 — 519733 — 6-4028 
0-65 | +0-12512 23767 | — 0-71722 563 — 1-43603 41 — 2-61204 0 — 3-92587 — 4-6289 
| 70 | + +12937 72627 | — -52530 129 — 1-07090 88 — 2-01405 3 — 2-89284 — 3-0937 
| +75 | + -12108 67383 | — -36416 278 — 0-79022 53 — 1-559797 — 2-00796 — 1:5385 
| +80 | + +1030705605 | — -22677 045 — +58259 47 — 1-234401 — 1-02001 — 0-0486 
a=z1;6=2 
Pr +=0 1 2 3 + 5 
| 0-50 Zero + 1-25331 414 + 2-77559 18 + 5363198 + 875962 +11-4214 
| +55 | +0-01136 37553 | + 0-91528 794 + 1-99816 59 + 3-94408 6 + 6-66400 + 8-9169 
| +60 | + -04132 88345 | + -63626 717 + 1-39118 67 + 2-918210 + 523275 + 17-0043 
0-65 | +0-08460 58919 | + 0-39588 585 + 0-90292 68 + 2-192519 + 432775 + 53958 
| +70 | + -13659 14894 | + -17777 506 + -50048 90 + 1-735359 + 3-90170 + 3-5681 
‘75 | + +19275 75164 | — -03326 657 + +16593 29 + 1587061 + 3-97664 + 0-2068 
“80 | + +24788 04893 | — -25444 627 — +09931 85 + 1-928250 + 458381 — 95105 
a=0; b=3 
| Dy i=0 1 2 3 4 5 
| 0-50 Zero Zero Zero Zero Zero Zero 
-55 | +0-00198 42899 | + 0-60027 650 + 1-56370 33 + 3216610 + 521046 + 61271 
| -60 | + -01626 10216 | + 1-26131 036 + 333024 39 + 6-92451 6 + 11-21360 + 12-7370 
| 
0-65 | +0-05720 92470 | + 2-05913 484 + 556844 85 + 11-80109 6 + 19-07752 + 20-1380 
‘70 | + +14420 79897 | + 310858930 | + 872357 20 + 19-03930 5 + 30-60294 + 27-6817 
| +75 | + +30684 99544 | + 4-61173540 | +13-67166 50 + 31-16362 9 + 49°24546 + 30-0902 
‘80 | + -59614 24549 | + 6-97960725 | +22-44708 89 + 54-56974 8 + 82-85634 — 3-8336 
Yank MR Ub bats | i | 
a=4; b=0 
| p, i=0 1 | 2 3 4 5 
| 0-50 | +0-40528 47346 | + 2-77960 780 + 835771 08 + 18-68579 9 + 34-48894 + 52-9242 
-55 | + +26820 92879 | + 2-00022 412 | + 6°11033 59 + 13-48275 0 + 24-65797 + 38-1209 
60 | + -17190 36477 | + 1-42110 843 | + 447700 29 + 977832 5 + 17-75862 + 28-8821 
0-65 | +0-10544 49576 | + 0-98812 112 + 3-26979 78 + 17:07742 6 + 12-79818 + 20-0808 
‘70 | + -06086 80938 | + -66450 834 + 2-36773 64 + 5081057 + 916367 + 14-5179 
*75 | + -03222 86752 | + -42471 523 + 1-68335 56 + 3-55091 1 + 6-48876 + 10-3589 
-80 | + -01499 80883 | + -25066 060 | + 1-16945 84 + 2-38785 3 + 451560 + 7:0893 





























U 
VW Pp: a,b) = Zl Prad) (n+ 2)-*+ O(n +42)" 






































J. G. Saw 221 


Table 4. (cont.) 
















































































a=3;b6=1 
| | 
Pr i=0 1 2 3 4 5 | 
0-50 Zero — 1-90985 932 — 7:09859 32 — 16-96858 2 — 32-40443 — 50-9313 
‘55 | +0-04683 35491 | — 1-27350 658 — 511160 20 — 12-13978 0 — 22-98248 — 36-4677 | 
‘60 | + -06763 62876 | — 0-81308 492 — 3-68910 30 — 8-72355 1 — 16-41022 — 26-2976 | 
0-65 | +0-07130 03135 | — 0-4808! 7-7 — 2-65710 05 — 6251132 — 11-69023 —19-1328 | 
‘70 | + :06426 21696 | — -+24469 604 — 1-90132 05 — 4-42831 5 — 8-04888 — 12-9536 | 
‘75 | + -05130 47049 | — -08292 942 — 1-34719 88 — 3-:025503 | — 5-78160 — 10-3959 | 
‘80 | + -03606 97901 | + -01937 275 — 6-93334 29 — 1-93697 6 | — 3-97473 — 8-3546 
| | 
a=2;b=2 
Pr 7=0 1 2 3 4 5 
| 
0-50 Zero + 1-00000 000 + 5-85840 73 + 15:33682 8 + 30°35326 + 48-7535 
‘55 | +0-00817 78724 | + 0:49549 537 + 4:13485 63 + 10-89204 3 + 21-30323 + 34-4544 
‘60 | + -02661 18111 | + -14438 212 + 2-93188 34 + 17°775807 + 14-97114 + 24-7003 
0-65 | +0-04821 22125 | — 0-09428 277 + 2-08993 73 + 5°52768 5 + 10°39306 +17-8873 | 
‘70 | + 06784 55029 | — -24694 137 + 1-50603 71 + 3°84205 1 + 7:01095 +13-4316 | 
‘15 | + -08167 17639 | — -33028 075 + 1-10871 67 + 2-52058 0 + 443318 +11-6188 
‘80 | + -08674 63723 | — -+35374 087 + 0-84487 81 + 1:359217 + 2-69868 + 13-3810 
a=1;6=8 
P- 4=0 1 2 3 4 5 
| 
0-50 Zero Zero — 471238 90 — 14:137167 — 28-84632 — 46-1684 | 
‘55 | +0-00142 79848 | + 0-38670 336 — 3-30395 97 — 10°215910 -- 20°17752 — 31-5595 | 
‘60 | + -01047 05405 | + -63459 486 — 2-40504 86 — 17-57432 2 — 13-96098 — 20-7527 | 
0-65 | +0-03260 03817 | + 0°77376 664 — 1-88537 35 — 5-76988 2 | — 9-06185 — 12-2097 | 
‘70 | + -07162 86470 | + -81809 452 — 1-67503 46 — 4466891 | — 4-53293 — 51436 
| +75 | + +13001 29691 | + -+76647876 | — 1-74795 44 — 3°342897 + 0:84506 — 0°2535 
‘80 | + -20862 14831 | + 59831 364 | — 2°11574 24 — 1-803214 + 9°56395 — 5°7245 
a=0; b=4 
Pr i=0 1 | 2 | 3 4 5 
| | 
| ee ee 
| | | | | 
0-50 | Zero Zero | + 7:40220 33 + 23-953439 | + 52-76060 + 90-1904 | 
55 | +0-00024 93485 | + 0-15046918 | + 8-0341915 | + 25-816437 + 56-89313 + 96-7325 | 
60 | + +00411 96827 | + -63247 450 +10-08419 24 | + 31-938907 + 70-57389 + 118-0744 | 
0°65 | +0-02204 38937 | + 1-55030 000 | +14:09144 07 | + 44-232797 + 98-41969 + 160-0718 | 
‘10 | + +07562 27437 | + 312892631 | +21-2827261 | + 67-253048 | +151-62993 + 234-9842 
‘75 | + -20696 71491 | + 5-83684 664 | +34-42516 69 | +112-00183 6 | +257-67709 + 364-3768 | 
80 | + -50172 61482 | + 10-72416 185 + 60-44444 81 | + 208-72446 7 +495-24379 + 561-6867 
| 





l 
W(p,,n: a,b) = XH, p,,a,b) (n+ 2)-*4+ O(n + 2)-"4 
i=0 








[ 222 ] 


THE RELATION BETWEEN THE DICTIONARY DISTRIBUTION 
AND THE OCCURRENCE DISTRIBUTION OF WORD LENGTH 
AND ITS IMPORTANCE FOR THE STUDY OF 
QUANTITATIVE LINGUISTICS 


By G. HERDAN, M.Sc., Pa.D., LL.D. 


Lecturer in Statistics, University of Bristol 


One of the many unsolved problems of quantitative linguistics is the relation between the 
vocabulary and the occurrence distribution of specified linguistic forms, i.e. between the 
frequency distribution in the dictionary and that of occurrence in the spoken or written 
language. 

For different linguistic forms, such as phonemes, phoneme combinations (morphemes), 
word length (in terms of syllable, phoneme, letter number), the answer may conceivably be 
different. The present investigation deals with the characteristic of word length in terms of 
phoneme* and letter number per word, and arrives at the conclusion that the occurrence 
distribution can be regarded as a moment distribution of the vocabulary distribution. 
This will be shown to be a consequence of the log normality of word length distributions. 

To log normality of certain linguistic distributions as an empirical fact Williams has 
drawn atention (1940, 1946). In this paper the hypothesis of log normality is extended to 
a set of related linguistic variables, viz. the distributions of both word occurrence and 
vocabulary according to word length in terms of the number of letters as well as phonemes. 
The comparison of these distributions as lognormal variates reveals the hypothesis of 


log normality as being of great value for the study of quantitative linguistics, practical and 
theoretical. 


I 


The material for the investigations was provided by a count of approximately 80,000 word 
occurrences obtained from telephone conversations by French, Carter and Koenig of the 
Bell Telephone Co. (1930). The distributions according to word length of the 76,054 word 
occurrences and of the 738 vocabulary items, that is different words, contained in the sample 
in terms of both letters and phonemes are shown in the following table (from Herdan, 1956), 
where p; is the percentage of words containing i units and x, the average frequency of such 
words in the spoken language. 

As shown below, if the occurrence distribution is a moment distribution of the vocabulary 
distribution, their standard deviations must be sensibly equal, which will result in the 
parallelism of the lines representing these distributions on a logarithmic probability grid. 
For the phoneme distribution this is the case, for the letter distribution they differ by 
0-050, the lines being slightly convergent. 

The plot of these distributions on a log probability grid whose abscissa has a logarithmic 
scale and whose ordinate is the Gaussian integral, is in each case a sensibly straight line, 

* The phoneme is the smallest linguistic unit with distinctive function in the spoken language; it 
corresponds in magnitude to letter in written language. It is well known that in English the corre- 


spondence between letters and phonemes is not always very close: thus the written form of the word 
‘thought’ requires seven letters, but the spoken form is a sequence of only three phonemes: ‘ot’. 


which 
correct 
& Bro’ 
we tak 
of a lo 
param 
A(x |p 
define 








ON 
'H 








G. HERDAN 


223 


which suggests log normality of the variates (Fig. 1). If the hypothesis of log normality is 
correct, they should comply with the characteristic features of such distributions (Aitchison 
& Brown, 1957). As the most characteristic feature—and peculiar to lognormal variates*— 
we take the moment distribution property, according to which the jth moment distribution 
of a lognormal distribution with parameters y and o* is also a lognormal distribution with 
parameters y +jo* and o?, respectively. Writing for the logarithmic distribution function 
A(x |“, 0), where yw is the logarithmic mean and @ the logarithmic standard deviation, we 
define the jth moment distribution function as 


Aj(x | #, 07) = zl wdA(u lH, o*), 


0 


where A; = e/#+4i°? is the jth moment about zero. 















































Table 1 
| Letters Phonemes ae, 
| eae ae | 
| No. of units, 7, Vocabulary | Occurrence Vocabulary | Occurrence 
| per word j | 
| k | k k k 
| xp; | xX pix, x py | xX pix; 
Pi i=1 | Pir; i=1 Pi i=1 Pir; i=1 
= | | 
1 0:3 0-3 «| 8-0 8-0 0-4 0-4 | 8-0 8-0 
2 3-2 3-5 | 20-9 28-9 8-1 8-5 36°8 44-8 
| 3 9-8 13-3 25-2 54-1 30-8 39-23 | 35-0 79-8 
| 4 26-5 39-8 | 26-2 80:3 23-4 62-7 | 11-4 91-2 
| 5 19-1 58-9 | 9-45 89-8 14-6 77°3 | 4-4 95:7 
} 6 15-6 74-5 | 4-36 94-1 8-0 85:3 1-6 97-2 | 
7 10-0 | 845 | 265 96-6 6-0 91-3 1:3 98-5 | 
8 6-1 90-6 1-87 98-5 4-5 95-8 0-8 99-3 
9 4-2 94-8 | 0-8 99-3 2-7 98-4 0-5 99-8 
10 2-6 97-4 | 0-46 99-8 1-1 99-5 0-08 99-96 
| ll 1-9 99-3 | 0-25 99-05 0-14 99-7 0-01 99-98 
12 0-4 99-7 | 0-03 99-99 0-27 100-0 0-02 100-0 
13 0-2 100-0 | 0-01 100-0 —- | — — _— 
| | | ilies se 
| x ' i . | 
| Mean of log,)i 0-703 0-494 0-608 | 0-414 
0-168 0-218 0-189 0-187 


8.D. of log, 57 














Proof of property (Aitchison & Brown, 1957). 


A,(x | u, 07) = 52 J waa |.) 
| 


_ e-in-tres|" el logu ia exp id = (log U -n)\ du 


0 Ue 


i 1 1 as sail 
= t aa Jn) exp| — 8 (log u—y en du 


= A(v | +jo%, 0°). 


* Though not exclusively so. 








224 —- Study of quantitative linguistics 


It follows that the graphs of different moment distributions should be parallel lines on 
log-probability paper, with a distance of jo? between them. 

Turning now to the distributions of Table 1 and Fig. 1, we find the logarithmic pro- 
bability graphs for vocabulary and occurrences for both letters and phonemes, to be sen- 
sibly parallel lines. If what is plotted are different moment distributions, the distance 
between the vocabulary and occurrence lines must be that required by the moment theorem. 

















Accumulated total as percentage of whole sample 
+ 
~ 


xe 






































eS 4 6.810 
1 > eee 


Number of letters (phonemes) per word 
Fig. 1. O, word occurrence against phoneme number; @, vocabulary against phoneme number; 
+, word occurrence against letter number; x , vocabulary against letter number. 


k k 
What is plotted against log word length for vocabulary is 5 p;, and for occurrence > p;%;, 
i=1 i=l 


where p; is the number of vocabulary items (different words) of length i, and x; the average 
number of occurrences of a word of length i. In an efficient code, such as language may 


be regarded to represent, word length appears to be inversely related to frequency of 
occurrence by a function of the form 


fi) = (afi*) 0, 


where a, 6 and k are constants, and where d is so close to unity that as a first approximation 
the formula may be written as fi) = ai-* 


(Good, 1951; Siraon, 1955; Mandelbrot, 1954). In our notation x; = ai-*. If ¢ stands for 
length of English words in terms of letters or phonemes, k appears to be between 2 and 3, 
and we assume it to be 2:4. We write therefore 


acc ¢-%, 


k k k 
The log probability graphs of } p;for vocabulary and > p;2; (« p> pi) for occurrences 
i=1 i=1 i=1 








car 


; on 
T0- 
en- 


nce 
em. 


‘ion 


for 
d 3, 


1ces 





G. HeERDAN 225 


plotted against logi thus represent respectively the basic frequency distribution and the 
—2-4th moment distribution for vocabulary items according to log word length. If the 
variates are log normal these graphs may, therefore, be expected to show the characteristic 
relation between distributions of different moments explained above. This is actually the 
case. Moreover, if the distribution of occurrence represents the — 2-4th moment of the 
vocabulary distribution, the averages of the two distributions, as the 0th and —2-4th 
moment means, should be 2-40? apart, that is the occurrence mean should equal y — 2-40?; 
as will be seen, there is good agreement between observation and theory in this respect, 
the differences between the respective means being sensibly equal tc 2-407. 

Table 2 gives in column 3 the observed difference between the vocabulary and occurrence 
means, and in column 5 the theoretical value. 




















Table 2 

| Logarithmic Logarithmic Average 
| mean Difference S.D. s.p. (3) — 2-4 x 2-3026 x 8? | 

| (base 10) (base 10) ae 
ee a | 
Phonemes | | 
Vocabulary | 0-608 0-187 | . | 
Occurrences | gait dies Sue on agate 
Letters | | | 
Vocabulary 0-703 0-168 ‘ | 
Occurrences 0:494 ~~ sau | one Bio | 





The theoretical difference between the moment averages as given in the text is in terms of 
natural logarithms. If logarithms to the base 10 are used, the following transformation 
must be observed (Herdan, 1953). Since the mean and standard deviation of the log 
distribution are the logarithms of the geometric mean G@ and the geometric standard devia- 
tion, 7g, respectively, viz. 


y=InG, o=In¢g, 


the relation Y= yVt+jo* 


can be written as InG; = n@+jln* og. 
Transforming into logarithms to the base 10, we have 

logy) G; = logy) G + 2-3026) logiy vg, 
or Yj = ¥ + 2:3026jo?. 


The last column of Table 2 shows 2-3026jo? with j = — 2-4 and with the mean values, 8, 
from the preceding column substituted for o. These figures are in good agreement with the 
observed differences given in the third column. 

We may, therefore, take it as a reasonable hypothesis supported by some statistical 
evidence that: 

(a) the distribution of vocabulary and that of occurrence against word length in terms of 
number of letters or phonemes satisfy the criteria for log normality; 

15 Biom. 45 


14 
t 








226 “Study of quantitative linguisiics 


(6) the distribution of occurrences may be regarded as the — 2-4th moment distribution 
of vocabulary according to word length. 

The practical importance of this lies in the fact that knowing either the vocabulary or 
the occurrence distribution, it is possible to obtain immediately an estimate of the other 
distribution. Let us assume we had made a vocabulary count and wanted an estimate of 
the distribution of occurrences of that vocabulary in a continuous text of a certain length. 
Knowing the logarithmic mean y and the logarithmic standard deviation o of the vocabulary 
count,we calculate the average number of occurrences as y — 2-407, and knowing that the 
distributions are lognormal with the same standard deviation, we can immediately con- 
struct the straight line on log probability paper which represents the cumulative distribution 
of occurrences. Working backwards from this, we obtain the distribution function. This 
may be of interest in connexion with mechanical translation. Given a vocabulary with a 
certain distribution of words according to their length, we can quickly obtain an estimate 
of the distribution of occurrences in a text of a given size. 

Considering that the method rests upon the assumption of an inverse relation between 
frequency and word length, which may be true to a different degree for different material, 
it follows that the multiple of 2-4 will give a satisfactory solution only to the extent to 
which the assumption is in agreement with the facts. The less true the assumption of strict 
reciprocity, the more will the multiple differ from 2-4 and it will, therefore, in general, be 
necessary to ascertain by a pilot investigation or from the literature how far the reciprocity 
assumption is true, in order to choose the correct multiple of o?. For determining the 
constant, Mandelbrot’s (1953) or Simon’s (1955) formula will be found useful. 

Conversely, from the distance of the log probability graphs for vocabulary and occurrence 
distribution, inferences may be drawn about the extent to which the reciprocity relation 
(Zipf, 1949; Simon, 1955), is true for a given language. 


II 
It is of interest to compare the two distributions with regard to the information they 
contain. Considering that in an efficient coding system, log icc log 1/p; (p,; standing for the 
probability of occurrences of symbols of length i) and Lp; logicc Up; log 1/p;, the first 
moments of our distributions appear to be proportional to the information J, for vocabulary 


and J, for occurrences which, according to the theory of information (Brillouin, 1956), are 


calculated as 
I, = ere Kxp,; log Pris 


and I, = — KXpo; log p9;, 
where p,; is the probability of word length i in the dictionary and pp, in the spoken language. 
For a binary system, for which K = 3-322, these quantities become in our case: 
Letters: I, = 2-988 bits/word length, M = 5-417; 
J, = 2-628 bits/word length, M = 3-508. 
Phonemes: I, = 2-726 bits/word length, M = 4-413; 
I, = 2-274 bits/word length, M = 2-856. 
This enables us to compare the information property of letters against phonemes, and for 


each of vocabulary against occurrences. What we are doing, in using the entropy J as a 
characteristic instead of the average M, is to make use of the fundamental rule which plays 








a mec 
prob 
bina 

Ww 
that 
whic 
(by. 
whic 
info! 
A sr 
gues 
stru 
littl 
othe 
ther 
corr 
rare 
(anc 


ion 


r or 
her 
> of 
sth. 
ary 
the 
on- 
ion 
‘his 
ha 
ate 


een 
‘ial, 
b to 
rict 
be 
sity 
the 


nce 
‘ion 


hey 
the 
irst 
ary 
are 


ge. 


for 


sa 
ays 





G. HerDAN 227 


a most important part in statistical thermodynamics, viz. that the average and the most 
probable distribution coincide for large numbers. The relation between the two is for 
binary coding expressed by Shannon’s fundamental theorem J < M. 

When trying to assess which code is more efficient, we must not be misled by the impression 
that a greater J means more information, and thus a more efficient code. The entropy I 
which originally was considered by Shannon as a measure of information is now recognized 
(by Brillouin, for instance) as being a measure of unexpectedness or lack of information, 
which has compelled Brillouin to introduce the concept of negentropy for measuring 
information. The apparent confusion of thought is resolved by the following consideration: 
A small entropy (and a great redundancy) implies a small effort in terms of number of 
guesses for restoring missing information; only a great store of information about the 
structure of the code will enable the receiver of the message to make correct guesses with 
little effort, and a small J (great R) thus implies much past or advance information. On the 
other hand, if we think in terms of what is gained by such guesses, that is, future information, 
then it is clear that symbols with great probability of occurrence which are easily guessed 
correctly will add little to our knowledge, and thus represent little information, whereas 
rarely occurring symbols which are only guessed with difficulty mean much information 
(and little redundancy). 

For our data we find 


(a) [,(phon.) = 2-726 < I,(lett.) = 2-988, 
I,(phon.) = 2-274 < J,(lett.) = 2-628, 


and thus I(phon.) < J(lett.) 
Further, we find 


(b) I,(phon.) = 2:274< I,(phon.) = 2-726, 
[,(lett.) = 2-628<T,(lett.) = 2-988 
and thus J, < I,. 

This means 

(a) that the phonemic channel is more efficient than the letter channel; and 

(b) that the knowledge of occurrences makes it easier to guess at missing information 
and that it therefore represents some additional information over and above that provided 
by the mere vocabulary code. 

The basis of linguistic information theory is the hypothetical solidarity between the 
various items of a specified linguistic characteristic, word length in our case. Such solidarity 
exists between the items in the dictionary and between their frequency of occurrence in 
the spoken and written language, and makes it possible to guess—with some expectation 
of being correct—at the word length of individual items chosen at random from either 
dictionary or speech. The number of such guesses for arriving at a probably correct result 
is epitomized by the value of the entropy. The fact that J, is less than J, means that the 
uncertainty of word length in speech is less than in the dictionary. This is only what one 
might expect, since in speech the solidarity of word length in the dictionary is supplemented 
by that of occurrence, and we have therefore more information available for guessing word 
length correctly. 

Returning now to the conclusion reached in §I, it seems rather interesting that the 
introduction of the lognormal hypothesis for the mathematical structure of our distribu- 


15-2 








228 Study of quantitative linguistics 


tions should enable us to add, and subtract, information when only one distribution, 
vocabulary or occurrences, is given. For this is what the estimation of the other distribution 
means in terms of information theory. Being given the vocabulary distribution and deriving 
that of occurrences by the method established above, we reduce J, to J, which means less 
deciphering work because of more advance information, and vice versa if the vocabulary 
distribution is derived from the occurrence distribution. 


SUMMARY 


1. From a representative sample of 76,054 words of spoken English the frequency dis- 
tributions of vocabulary and occurrences according to word length in terms of letters and 
phonemes were obtained, and subjected to statistical and information-theoretical analysis. 

2. The distributions were found to be sensibly lognormal. 

3. The relation between the occurrence and vocabulary distributions according to 
either, letter or phoneme number is such that they could be interpreted as different moment 
distributions of a lognormal variate. This admits the derivation of one distribution by the 
moment distribution theorem for lognormal variates if the other is known. The practical 
importance of this lies in the possibility of a quick estimate of the occurrence distribution 
without having to carry out a complete word count or of the vocabulary distribution 
without a dictionary count. 

4. A transformation of the logarithmic variable of the distributions by using an esta- 
blished relation between frequency and length of linguistic symbols shows the occurrence 
and vocabulary means to be of the form of the theoretical measure of information. The 
comparison of the statistical results with their information-theoretical interpretation 
enables us to assess the value, in terms of information, of using the moment distribution 
theorem of log normality. 


REFERENCES 


Arrcuison, J. & Brown, J. A. C. (1957). The Lognormal Distribution. Cambridge University Press. 

Britxourn, L. (1956). Science and Information Theory. New York: Academic Press Inc. 

Frencu, N. R., Carter, C. W. Jr. & Kornic, W. Jr. (1930). Words and sounds of telephone con- 
versations. Bell. Syst. Tech. J. 9, 290. 

Goon, I. J. (1957). Distribution of word frequencies. Nature, Lond., 179, 595. 

HeErpDAN, G. (1956). Language as Choice and Chance. Groningen: P. Noordhoff, N.V. 

HeErpan, G. (1953). Small Particle Statistics. Amsterdam: Elsevier. 

MANDELBROT, BENorT (1953). An informational theory of the statistical structure of language. In 
Communication Theory (ed. by Willis Jackson). London: Butterworths. 

Stmon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42, 425. 

Witiams, C. B. (1940). A note on the statistical analysis of sentence length as a criterion of literary 
style. Biometrika, 31, 356. 

Witiams, C. B. (1956) Studies in the history of probability and statistics. IV. A note on an early 
statistical study of literary style. Biometrika, 43, 248. 

Zier, GEORGE KINnGsLEy (1949). Human Behaviour and the Principle of Least Effort. Addison-Wesley 
Press. 





di 


‘ion, 
tion 
ving 
less 


ess. 


con- 


In 


rary 
arly 


sley 





[ 229 ] 


SECOND PAPER ON STATISTICS ASSOCIATED WITH THE 
RANDOM DISORIENTATION OF CUBES 


By J. K. MACKENZIE 


Division of Tribophysics, Commonwealth Scientific and Industrial Research 
Organization, University of Melbourne, Australia 


Theoretical density functions are obtained for the angle of disorientation (the least angle of rotation 
required to rotate a cube into a standard orientation) and for Min<100> (the least of the nine acute 
angles between the edges of a cube and the edges of a fixed reference cube). These density functions and 
their cumulative distribution functions have been evaluated numerically. 


1. INTRODUCTION 


In a recent paper Mackenzie & Thomson (1957) described a class of problems in three- 
dimensional geometrical probability, and some of the associated density functions were 
estimated numerically by means of random sampling. In this paper two of these density 
functions are obtained in analytical form and, together with their cumulative distribution 
functions, evaluated numerically.t+ 

The two density functions obtained are those for the angle of disorientation and Min (100). 
These two variables can be defined as follows. Consider two cubes, A and B, and imagine 
A to be a reference cube with its edges parallel to a fixed set of co-ordinate axes and its 
centre at the origin, while B is initially coincident with A but free to rotate in any manner 
about the common centre of A and B. If Bis given an arbitrary rotation there are 24 definite 
rotations which will restore B into coincidence with A; these are just the reverse of the 
original rotation taken together with the 24 proper symmetry operations associated with 
a cube having indistinguishable faces (see §2). The angle of disorientation is the least 
(in magnitude) of the 24 angles of rotation so obtained, while Min (100) is the least of the 
nine acute angles between the edges of the cube B in an arbitrary orientation and the 
edges of A. 

The success of the present calculations has depended essentially on reducing the amount 
of detailed calculation required although this is still quite considerable. Since this reduction 
can be made for a whole class of problems, including the two special cases discussed in detail, 
a formulation of this class is given in § 2 together with their formal solution in a form which 
is of no practical use. Section 3 is devoted to reductions common to the whole class, while 
§4 completes the reduction for the two special cases. The fact that these reductions involve 
arguments which are basically of a group-theoretical nature suggests that a more systematic 
use of group theory might make practicable a solution for the whole class. 

The density functions for the angle of disorientation and Min 100) are given both 
analytically and numerically in §§5 and 6. A large amount of algebraic detail has been 
omitted in these sections and only a few important intermediate results are given. 


+ Following the preparation of the paper by Miss Thomson and myself, a copy of which was sent via 
Mr Hammersley to Mr D. C. Handscomb, the latter wrote to me to say that he had found the exact 
distributions of the angle of disorientation by geometrical means, and gave the formulae (5:3), (5-5) 
and (5-6) of my paper below. His method (the particulars of which I have not seen at the time of writing) 
is, I understand, quite different from mine and I have therefore given my own derivation in full. 
His paper, I am informed, is to appear in the Canadian Journal of Mathematics. 








230 Statistics associated with the random disorientation of cubes 


2. FORMULATION OF PROBLEMS 


Since the group of symmetry operations on a cube with indistinguishable faces (the cubic 
group) plays a fundamental role in both the formulation and the reduction of the class of 
problems, a brief statement of these symmetry operations will first be given. 

If the cube A has indistinguishable faces it is invariant under the 48 symmetry operations 
of the cubic group consisting of 24 proper rotations and 24 improper rotations, which are 
proper rotations together with an inversion or a reflexion. The 24 proper rotations are 
(a) the identity element or no rotation, (b) rotations of 180° about the three axes of reference, 
(c) rotations of + 90° about the same axes, (d) rotations of 180° about axes parallel to the 
six face diagonals of the cube, and (e) rotations of + 120° about axes parallel to the four 
diagonals of the cube. On taking axes of reference parallel to the edges of tlie cube these 24 
rotations can be represented by 3 x 3 orthogonal matrices. These matrices have only three 
non-zero elements which are either + 1 or —1 and these are arranged in all possible ways 
such that there is a non-zero element in each row and column and the determinant of the 
matrix is + 1. 

In all that follows, the matrices representing these proper symmetry rotations will be 
denoted by S,; (i = 1,...,24); the improper rotations are then —S,. Further, the 3x3 
orthogonal matrix which represents an arbitrary proper rotation through an angle y about 
an axis in the direction n = [n,n_73] will be denoted by R with elements r,; given by (3-1) 
and this rotation will be described briefly as either the rotation R, the rotation y, n or the 
rotation y[n,n2Ns]. 

Since Tr(R) = ry, +7%o9+133 = 1+2c08 ¥, (2-1) 
it follows that the angle of disorientation yy, is given by 


1+2cos yf, = Max Tr(RS,), 
b 


= Max Tr(S,RS,), (2:2) 
4,3 


on using the facts that for any matrices B,C Tr(BC) = Tr(CB) provided both products 
exist and that the product S,S, is another symmetry rotation. 

Further, a generalized variable Min (www) can be defined as follows. Let u be a 3x1 
column matrix with elements equal to the direction cosines of the direction [uvw], so that 
the set (uvw) of variants of [uvw] are the 24 directions S,u together with the 24 directions 
— S;,u. Thus, the cosine of the angle 0,; between a variant, + S;u, and what another variant, 
+ S,u, becomes after a rotation R, is given by cos 6;; = + u’S;RS,u when the usual scalar 
product is written in matrix notation. Then 


cos (Min (uvw)) = Max | u’S;RS,;u |, (2:3) 
i,j 

= Max | Tr(S,RS,uu’) |. (2-4) 
i,j 


Since the variants (100) of [100] are parallel to the edges of the cube A the definition of 
Min (100) given in the introduction is a special case of (2-3). Equation (2-4) leads to an 
equivalent definition of Min (100). For, ifu = [100], S;uu’S, is a matrix with only one non- 
zero element which is + 1 and may be in any position in the matrix; the trace of the product 
with R then gives + the corresponding element in R’. Thus, the cosine of the angle Min (100) 
is the largest of the moduli of the elements of the orthogonal 3 x 3 matrix R. 





cul 


ie oh a aa 





J. K. MACKENZIE 231 


If in (2-4) the S,, S; are allowed to range over the full cubic group, the modulus sign can 
be dropped. Then comparison of (2-2) with (2-4) shows that all cases are special cases of 
i,j 
where A is a given (symmetric) matrix and the S range independently either over the whole 
cubic group or over the subgroup of proper rotations. If V(R) is a given probability measure 
on the space of (proper) orthogonal matrices, the cumulative distribution function of x(A) is 


P{x(A) < X} = fa V(R), (2-6) 


where the region of integration includes all R for which 
x(A)<X. (2-7) 
This formal solution is no more than a statement of what is required and the practical 


problem is first to assign a suitable measure to V(R) and second to determine the region of 
integration. 


3. THE PROBABILITY MEASURE AND REDUCTION OF THE REGION OF INTEGRATION 


When concerned with problems in geometrical probability the appropriate probability 
measure is determined by a principle of invariance enunciated by Deltheil (1926, p. 13). 
This principle asserts that the result of the calculation must be invariant for any displace- 
ment of the whole figure. In the present case, this means invariant for any rotation of the 
cubes A and B together as a unit. 

Writing c = 1—cosy and s = siny, the rotation y, n has the matrix representation 


l—c+nic, nNgc—Ng8, N,NzC+Ngs 
R=] nn.c+n,8, 1l—c+nzc, nygngc—n,8 ], (3-1) 

N1NzglC—NyS, NgNglC+N,8, 1l—c+n§e 
and in this case Deltheil (1926, p. 105) shows that the element of probability measure is 
dius | dV(R) = (1/272) sin? dyrdydS, (3-2) 


where dS is an element of area on the surface of the unit (hemi-) sphere n? + nj + nj = 1 
and, if spherical polar co-ordinates 0, ¢ are used to specify the axis of rotation 


dS = sin6déd¢. (3-3) 


The whole space of R is covered once if —7< <7, 0<¢< 27 and 0<6@< 41, and the total 
volume of the space is unity. 

The density function defined by (3-2) is the analogue of a uniform density for a one- 
dimensional variable with a finite range. Further, the invariance properties of this density 
and its uniqueness arising therefrom ensure that it is identical with that implied by Mac- 
kenzie & Thomson (1957) in the construction of their random orthogonal matrices. 

The region of integration can now be subdivided into 24 = 576 equivalent regions. For, 
consider the pair of rotations R and S,RS,. Since products of the type S,S, or S,S; run 
through the complete sequence of symmetry rotations as S; or S; do so, it follows from (2-5) 
that the value of x(A) is the same for both rotations. But the invariance properties of the 
probability measure defined by (3-2) ensure that corresponding regions in the neighbourhood 








232 Statistics associated with the random disorientation of cubes 


of the two rotations have equal volumes and so only 1/576 of the total volume need be 
considered. 

The same result can also be reached using the geometrical model mentioned in the intro- 
duction. For suppose that the cube B is subjected to the sequence of rotations S,RS,. 
The above result now follows on using Deltheil’s principle of invariance, provided that the 
final geometrical relationship between the cubes A and Bis the same whatever the symmetry 
rotations S,,S,may be. That this is so can be seen as follows. After the symmetry rotation S, 
the cube B is still coincident with the cube A so that after the further rotation R the relation- 
ship between A and B is independent of S,. If the two cubes are now rotated together as 
a rigid body by the rotation S,, the cube B reaches its final orientation and A remains 
invariant; thus, the final relationship between the cubes A and B is independent of S,, also. 

Although a preliminary subdivision of the region of integration into 24 equivalent regions 
can be defined in a simple way in the general case, the further subdivision of each of these 
regions into 24 parts is carried out in a manner suited to the two special problems. The 
preliminary subdivision is determined by the fact that if R represents a rotation y,n then 
SRS“ represents a rotation y, Sn. The end-point of the unit vector n lies on the surface 
of a unit hemisphere and the product Sn simply permutes the components of n in order 
and sign. Thus, the surface of the unit hemisphere can be divided into 24 equivalent spherical 
triangles bounded by great circles for which either the moduli of two components of n are 
equal or one of the components is zero. It suffices to consider those axes n which lie in any 
one of these triangles. 

Since rotations about the same axis and throuzh the same angle but in opposite senses 
leaves the geometrical relationship between the cubes A and B unchanged, a further halving 
of the region of integration is achieved. Thus, it may be assumed that the angle of rotation 
y is positive and that the axis of rotation lies in the spherical triangle defined by 


Ny > Ng > Nz > O0. 


4. THE REGION OF INTEGRATION 


It is convenient to carry out first the final subdivision and specification of the region of 
integration for the case of the angle of disorientation and then show that the same region is 
suitable for Min (100). 

Given any rotation R there is, in general, just one equivalent rotation RS, for which the 
angle of rotation is minimum.} Then, there is a unique equivalent matrix R* = S,RS,S;! 
for which the angle of rotation is least and the axis of rotation lies in the triangle defined by 
N, >N.>nz>0. But according to the result obtained in the last section it suffices to consider 
only those rotations R for which y > 0 and n, >.>, > 0. The final reduction of the region 
of integration is now determined by the conditions which ensure that R = R*. 

When R is given by (3-1), calculation shows that the cosine of half the angle of rotation 
determined by the matrix RS is given by the modulus of one of the five expressions 


cos $y, n, sin $y, 
(ny sin $y +008 by/)/,/2, [(my +9) sin 4Y]/y2, (41) 
4(n, + Ng+Nz) sin $y + 4.c0s hy, 


+ If this angle happens to be negative the transposed (or inverse) matrix represents an equivalent 
rotation through a positive angle. 





an 


an 


a tut 20 Gate ter 2 eG hens 


d be 


\tro- 
RS,. 
; the 
etry 
mn S, 
ion- 
or as 
ains 
ulso. 
ions 
hese 
The 
hen 
face 
‘der 
‘ical 

are 
any 


18es 
‘ing 
‘ion 


ion 


+1) 


ent 





J. K. MACKENZIE 233 


together with the modulus of what these five expressions become when n,, nz, ng are per- 
muted in all possible ways in order and in sign (24 values in all). Those written down corre- 
spond to the cases where S is the rotation identity, 180°[100], 90°[100], 180°[110] and 
—120° [111]. Clearly, when 0< <7 and n, >n,.>nz > 0, the largest value of the cosine will 
be one of those set out explicitly in (4-1). 
If R = R*, then cos 4y must be the greatest of the five expressions in (4-1) and after some 
manipulation of inequalities the region of integration is found to be 
O0<tan hy <(J/2-1)/n, for ./2n,>n.+N3, 
O<tandy<1/(ny+ngt+n3) for /2n,<ng+ns, (4-2) 
N, > Ny >Nz>O. 
The same region is also suitable for Min (100) since 7,, is the greatest element in R*. 
For, within the region (4-2), it is easily shown by using (3-1) that 
111 21222133 2 9, } 


30 2113 2% a1 > 1122131 2 "23> 


(4-3) 


and that r., > 0,73, < 9, while r,, may be either positive or negative. Finally r,, > 13. provided 
[1 — (ny + m2 +z) tan $y] [1 — (nm, — 2 — ns) tan $y] > 0, (4-4) 


and in the region (4-2) both factors are positive. 


5. THE ANGLE OF DISORIENTATION 


When attention is restricted to the region of integration defined by (4-2) and the axis of 
rotation is specified in spherical polar co-ordinates the probability element (3-2) becomes 


dV(R*) = (576/7?) sin 6 sin? tyr d0dddy. (5-1) 


Thus, the density function for the angle of disorientation y is found by integrating (5-1) 
with respect to 0 and ¢ with y fixed and for all @ and ¢ within the region (4-2), i.e. over part 
of a unit sphere. 

The spherical triangle within which n, >, >”; > 0 is shown as ST'U on part of a stereo- 
graphic projection (Barrett, 1952) in Fig. 1. The region ABSU contains that part of the 
region of integration determined by the first set of inequalities in (4-2) while ABT’ contains 
the part determined by the second set. Now for a fixed 7 the first inequality of (4-2) can 


be written ny <(J2—1)/tan Fy, (5-2) 


and it is clear that for 0< tan $y <,/2—1 or 0<y< 45°, this is always satisfied. Similarly, 
the second inequality is always satisfied for 0 < y < 60°. Thus, for 0< y< 45° the region of 
integration is the whole of ST'U. When yf > 45°, equality in (5-2) determines a small circle 
P,Q, with its centre at [100], so that for 45° < y < 60° the region of integration is STU with 
the part SP,Q, removed. Likewise when y is just greater than 60° a part 7'P;Q; determined 
by a small circle centred on [111] is removed from consideration so that only the region 
P, P/Q; UQ, remains; it is readily verified that the are P,Q, is just short of a small circle 
joining U and B. As y increases further the arcs P,Q, and P,Q; move towards one another 
until P, and P‘ coincide with B (and Q, with U) when tan $y = ,/2(,/2—1) or y = 60-72°. 
Finally, the common point P, moves along the arc BA until the region of integration P,Q, 
disappears at A when tan $y = (./2—1) (5—2,/2) or yr = 62-80°. 








234 Statistics associated with the random disorientation of cubes 


Thus, there are four ranges of y to consider and, within each, the density function takes 
a different analytical form. Each range is considered in turn below and the densities have 
all been multiplied by ;457 so that y is measured in degrees. 

(a) 0<tan4y<,/2—1 or 0<y<45°. The surface area of the triangle STU is 57 so 


that the density function is 
p(y) = (2/15) (1 —cos yf). (5:3) 
(b) /2—l<tan}y<1/,/3 or 45°<y<60°. In this case, a contribution from the area 


SP,Q, must be subtracted from (5-3). Using polar co-ordinates 0, ¢ with [100] as pole and 
¢ = 0on SU the area SPQ, is given by 


li [ da = 4n{1 —(J2—1) cot 4p], (5:4) 





[100} 


Fig. 1. Part of a stereographic projection of a hemisphere, showing the region of integration 
for calculating the density function for the angle of disorientation. 


where x = cos@ and 1, is given by (5-2) with the sign of equality. Thus, 
p(y) = (2/15) [3(./2 — 1) sin y — 2(1 — cos y)]. (5°5) 


(c) 1/,/3 < tan $y <,/2 (,/2—1) or 60° < y < 60-72°. In addition to the area given by (5:4) 
the area of the region 7'P;Q; must also be subtracted from STU. This is most simply done 
in terms of polar co-ordinates with [111] as pole and ¢ = 0 on ST’; the range of ¢ is to 47 
and the limits for cos @ are 1 and (cot $7)/,/3. The final result is 


p(y) = (2/15) [{8(2— 1) +4/,/3} sin y-— 6(1 — cos y)]. (5°6) 


(d) ./2(./2—1)<tan $y < (,/2—1)(5—2,/2)! or 60-72° < y < 62-80°. This case is a little 
more complicated than the preceding cases as it involves finding the area of the region 
BP,Q,U (where B and U are joined by a small circle centred at [100]) and the area of an 
analogous region on the other side of AB. Using polar co-ordinates as in (0), it is found that 





for 2 


in 


whe 
area 


the 
inte 


wh 


takes 
have 


57 80 
(5:3) 


area 
> and 


(5-4) 


5:5) 


5-4) 
one 





J. K. MACKENZIE 235 
for a fixed 0 the extreme values of ¢ at points such as Q, and P, are arcos(cot@) and 
}m—arcos (cot 0) respectively. Thus the area of BP,Q, U is given by 

v2 
| [42 — 2 arcos {x/(1 — x)4}] da, (5-7) 
MN 


where x and m, are the same as in (6). Similarly, using [111] as pole as in (c) the analogous 
area on the other side of AB is found to be 


|ae —arcos {(,/2 — 1)a/(1 —a?)#}] dz, (5-8) 


the range of integration being from (cot 4y)/,/3 to (./2+1)/,/6. Finally, evaluating all the 
integrals and combining the results as required gives 
P(W) = (2/15) [{3(./2 — 1) + 4/,/3} sin y — 6(1 — cos y)] 
— (8/57) [2(,/2 — 1) arcos (X cot $y) + (1/,/3) arcos ( ¥ cot 4y)] sin y 
+ (8/577) [2 arcos {(,/2 + 1) X/,/2}+ arcos {(,/2+1) Y//2}](l—cosy), (5-9) 
where X = (J2—1)/[1 —(./2—1)? cot? 4y}}, 
Y = (V2—1)%/[3—cot? dy}. 


The density function has been computed from (5-3), (5-5), (5-6) and (5-9) and is tabulated 
together with the cumulative distribution function in Table 1; the latter function was 
obtained by numerical integration of the density function. The mean, the standard deviation 
and the median were calculated to be 


Y = 40-736°, o = 11-315°, Wneq, = 42°341°. (5-11) 


(5-10) 


Table 1. Distribution of the angle of disorientation 
































y° p(y) C.D.F.t y° p(y) C.D.F.t 
0 0-00000 0-00000 60-0 0-01015 0-99228 
5 -00051 -00085 60-2 -00856 -99415 
10 -00203 -00676 60-4 -00695 -99570 
15 -00454 -02277 60-6 -00533 -99693 
20 -00804 -05383 60-72... -00434 -99752 
25 0-01249 0-10477 61-0 0-:00283 0-99850 
30 -01786 -18028 61-4 -00151 -99935 
35 -02411 +28487 61-8 -00070 -99978 
40 -03119 -42280 62-2 -00024 -99996 
45 -63905 -59810 62-6 -00003 1-00000 
50 0-03167 0-77586 62-799... | 0-00000 1-00000 
55 -02201 -91097 | 
60 -01015 -99228 
} | 





{ ©.D.F. = cumulative distribution function. 








236 Statistics associated with the random disorientation of cubes 


The density function has also been plotted in Fig. 2 and has a sharp peak at 45°; in fact the 
first derivative is discontinuous at y = 45 and 60°, while the second derivative is dis- 
continuous at 60-72°. This confirms substantially the guess made by Mackenzie & Thomson 
(1957) concerning the true nature of the distribution. The dots on Fig. 2 give a graphically 
smoothed estimate of the density function obtained from the random sampling calculations. 
The agreement between this estimate and the true density function is rather better than that 
expected since a sample of only 150 was used. 







































































0-05 
0-04 0-015 
0-03 ‘ _—* \ 0-010 
0-02 "4 0-005 
, ae 
4 lL i 
0-01 q ro 
- 
0 10203040 ~ ~50. 60" 70 


Angle of disorientation 


Fig. 2. The density function for the angle of disorientation. The ordinate is probability density 
when the angle is measured in degrees and the dots are estimates derived from random sampling. 


6. Min ¢100) 
If « is the value of Min (100), then, using (3-1) and the result of § 4, it follows that 


sin }a = sin@ sin $y, (6-1) 


and the probability element (5-1) becomes 


dV(R*) = (288/7?) sinadadfd¢d, (6-2) 
where f = arsin (¢ cot 0), (6:3) 
t = tan 3a. (6-4) 


The density function for « is found by integrating (6-2) with respect to £ and ¢ over the 
region determined by (4-2) and (6-1). The main difficulties are the determination of the 
appropriate limits of integration and the reduction of the double integrals to single in- 
tegrals. 

The limits of integration are determined in two steps. First, a diagram is constructed 
which shows the limits of the variables 0 and y for the case where the ¢ integration is carried 
out first. Most of the results required to do this are available as a result of the calculations 
in the preceding section, the only extension necessary arising from the fact that no use 
can now be made of the simplifications which arose previously by the use of a pole at [111]. 


The second step is to use this diagram together with (6-1) to obtain the limits for 0, and hence 
f, when a is fixed. 





Cl 


ity 


(6-1) 


(6-2) 
(6-3) 
(6-4) 


the 
’ the 
> in- 


cted 
Tied 
ions 

use 
11}. 
nce 





J. K. MACKENZIE 


The diagram is shown in Fig. 3 and the limits for ¢ in the three regions are 


I: 0<¢<iz, 


II: arcos(c 


ot 0)<¢<}s, 


TIL: arcos (cot 0) <¢ < 4m —arcos [(cot 4y — cos )/,/2 sin ” 


or arcos [t-! sin 8] < ¢ < 4m — arcos [t—' sin (ja — )]. 


























63 ' ' A ' q T 
62b . 
it 4 
60 . Y2 ‘ , = 
43° | a= 465° 
l 1 1 1 l 
” aot ss SS 























0 


(6-5) 


Fig. 3. Diagrams showing the limits of the variables 0 and y when the ¢ integration is carried 
out first. Some typical curves with « a constant are shown dotted. 


For a fixed y, the boundaries SS’, BU and T'T" are determined by the values of 6 at the 
corresponding points in Fig. 1; the values of cos@ are 1, 1/,/2 and 1/,/3, respectively. The 
boundary S’BA is determined by the value of @ on the ares P.Q, on Fig. 1, BT” by the value 
of 6 at Pj and AZ” by the value of 6 at Q). Thus, on 


S’BA: 
BT’: 
AT’: 


cos @ = (,/2—1) cot 4y, 
cos 6 +,/2sin@ = cot hy, 


cos + sin 0(cos¢ +sin¢) = cot hy, 


cos¢ = cot. 


(6-6) 








238 Statistics associated with the random disorientation of cubes 


The dotted curves like hyperbolae drawn on Fig. 3 are curves of constant a determined 
by (6-1), and the change of variables implied by (6-2) means that the integration with respect 
to # is along these curves on the diagram. The limiting values of # are determined by the 
intersections of the dotted curves with the solid boundaries drawn on Fig. 3. After some 
calculation it is found that at 


X,on S’BA: £ =n, 
Y,on BT’: £ = }n-arsint, 
Z,on AT": # =arsin[tcos(j7+/y)], > (6-7) 


T.on TT’: £ = arsin (t/,/2), 
U,on UB: f = arsint, 





where sin? y = [(/2+ 1)?@—1]/4,/2¢. (6:8) 


Clearly there are three ranges of « to be considered according as the curve « = constant 
intersects S‘'B, BA or AT’ when the subscript r in (6-7) takes the values 0, 1 or 2, respectively. 

The required integrals are all reduced in much the same way and only the case where 
the curve « = constant intersects S’B will be treated; in this case 


0<tan $a <[(/2—1)/2,/2]#, or O0<a<41-88°. 


Using Fig. 3, (6-5), and (6-7), the required double integral is written down and the integra- 
tion over ¢ carried out first. Then the term arcos (f-'sin f) arising from this integration is 
immediately replaced by an integral again. Omitting the factor (288/72?) sina, the result 


at this stage is 
dn arsint arcos (t~? sin £) 
in| dp — ap | dd. (6-9) 
arsin (¢//2) arsin (¢/+/2) 0 


Finally, inverting the order of integration in the second integral and evaluating gives the 
result 


ien?— |" arsin (tcos d) dd. (6-10) 
0 


The same result (6-10) is obtained for the next range of a, but a different result is obtained 
in the third range of a. When a is measured in degrees, the required density function is 


p(a) = (8/5m)sina| dyn? |” arsin (boos 4) 48], (6-11) 
0 


when 0<a< 45° and 


in ta+ 
p(a) = (8/57) sina syn? — | arsin (¢ cos #)dp—Iny+[ ” arsin (t cos ag, 

0 in—y 

(6-11’) 
when 45° < a < arcos § = 48-19° and where 
t = tan $a, 
| (6-12) 
sin?y = [(./2+ 1)?¢?—1]/4./2 #. 








Fig 


J. K. MACKENZIE 239 


ined For small «, equation (6-11) gives p(x) = (;457) 9sina, in agreement with the estimate 
pect made by Mackenzie & Thomson (1957). 

the Both the density function and its cumulative distribution function have been computed 
ome and the results are given in Table 2 and Fig. 4. The density function and its first derivative 


are continuous at a = 45°, but the second derivative is discontinuous there. The dots on 


Table 2. Distribution of angle Min {100) 



































(6-7) ae p(a) C.D.F.t a? p(a) C.D.F.t 
| as walk 
0 0-00000 0-00000 45 0-00290 0-99723 
5 01232 -03196 46 -00119 -99917 
(6-8) 10 -02180 -11846 47 -00033 -99987 
15 -02835 +24508 48 -00001 1-00000 
20 03191 *39701 48-18 -00000 1-00000 
tant 
vely. 25 0-03241 0-55910 
here 30 -02980 -71591 
35 -02403 *85181 
40 -01508 -95093 
45 -00290 -99723 | 
= 
oT a- + c.pD.¥. = cumulative distribution function. 
on is 
sult 
0-04 0-004 
(6-9) 








0-03 Ee 0-003 

3 the 0-02 0-002 \ 
\ 

a wr Lk gp ee 
i i 


isi — = =~ a mo Vibes me mo oe so oe 
Ine Min 100) 


is 
























































Fig. 4. The density function for the angle Min (100). The ordinate is probability density when the angle 
is measured in degrees and the dots are estimates derived from random sampling. 


3-11) 


Fig. 4 give a graphically smoothed estimate of the density function obtained from the 
random sampling calculations and again the agreement with the true density function is 
better than would have been expected. The mean, standard deviation, median and the 


mode were calculated to be 
11’) &% = 23-164°, 


o = 10-312°, 
med, = 23°183°, 
Cmoae = 23'308°. 


(6-13) 











240 Statistics ussociated with the random disoriention of cubes 


The integrals in (6-11) were evaluated by direct numerical integration while the integral 
in (6-10) was calculated from the power series 


47 ie) 
2 arsin (tcos¢)dd = ¥ dp,,, 0+, (6-14) 
0 r=0 
where a,=1, a, = 5/36, a; = 43/800, a, = 177/6272, 
dy = 2867/165,888, a,, = 11,531/991,232, (6-15) 


Ay3 = 92,479/11,075,584, a, = 74,069/11,796,480. 
In the neighbourhood of « = 45° the behaviour of the density function is given by 
p(a) = p_(a) = 0-002896 — 0-0027632 — 0-00007022, (6-16) 
for x = 45—a negative and 
pla) = p,(a) = p_(x) + 0-001107a# + 0-000018a8, (6-17) 
for x positive. Near the limit of the distribution at « = 48-19° 
p(x) = 0-000217(« — 48-19)2, (6-18) 
in agreement with the behaviour predicted by Mackenzie & Thomson (1957). 


I wish to thank Mr D. C. Handscomb for communicating to me the final result for the 
density function for the angle of disorientation. 


REFERENCES 


Barrett, C. 8. (1952). Structure of Metals, chap. II, 2nd ed. New York: McGraw Hill. 

DE LTHEIL, R. (1926). Probabilités Géométriques. Tome II, fascicule II of Traité de Calcul des Pro- 
babilités et de ses Applications par E. Borel. Paris: Gautier-Villars. 

Hanpscomps, D. C. (to appear). On the random disorientation of two cubes. Canad. J. Math. 

MACKENZIE, J. K. & THomson, M. J. (1957). Some statistics associated with the random disorienta- 
tion of cubes. Biometrika, 44, 205. 





egral 


614) 


6-15) 


6-16) 


6-17) 


6-18) 


r the 


- Pro- 


ienta- 





[ 241 } 


CONDITIONED MARKOV PROCESSES 


By W. A. ON. WAUGH 
The Canberra University College 


1. INTRODUCTION 


The problem that we shall consider can conveniently be introduced by means of the 
following example. Suppose a particle performs a simple one-dimensional random walk, 
making steps at successive discrete epochs of time, its possible positions being represented 
by the positive integers and zero. Suppose that zero is the only absorbing state, and that 
when the particle is at any point j ( > 1) it has probabilities p to move to j + 1 and q (= 1—p) 
to move to j — 1 at the next epoch. 

If the particle starts at the point 1 at time 0, and if p>q, then it is well known that the 
probability of ultimate absorption in the state 0 is q/p. We shall refer to the latter event as A. 

A finite sequence {x),x,,...,2,} of non-negative integers, with x, = 1, can represent 
successive positions of the particle at the first n + 1 epochs, and we shall call it a path of n 
steps. Consider a path in which there are r steps to the right and n —r steps to the left, and 
in which x; +0 for j = 0,1,...,n. Then x, = 2r—n-+1, and the probability that the particle 
performs this path and is ultimately absorbed at the origin is 


Piao, coey Uns A} = pa" (q/p)*-"*1 


= prergn, 
Hence, the conditional probability of the path on the hypothesis of ultimate absorption is 
P{aq, ...,%q_,| A} = p*™ 9’. 


This will be seen to be identical with the unconditional probability of the same path for 
a particle which executes a similar random walk but has the roles of p and q interchanged, 
i.e. moves to the right with probability q and to the left with probability p. The same inter- 
change can easily be shown to hold if x,, = 0 for some m in 1,...,n. 

This result can be described in more general terms as follows. The first random walk 
defines a measure P{.} over the space of sequences {zp, 7, ...} of non-negative integers. The 
paths are the finite-dimensional measurable subsets, and A is a measurable event of pro- 
bability g/p. The second random walk, with p and q interchanged, defines another measure 
which we can denote by P{.}. Then if Z is a measurable event 


P{E | A} = P{E}. (1-1) 


We shall consider the following generalization. Suppose a Markov process is given, 
having a discrete state-space. Denote the measure it generates by P{.}. For many such 
processes events which can be described as ‘absorption’ in some state or set of states can 
be defined. Call such an event A. We shall construct another Markov process which defines 
a measure P{.} over the same fundamental probability space, so that the relation (1-1) 
holds between the two measures. In other words, we produce a second Markov process 
whose (absolute) probabilities coincide with the conditional probabilities of the original 
process. 


16 Biom. 45 








242 | Conditioned Markov processes 


2. NOTATION AND PRELIMINARIES 


The result and proof will be given for Markov processes having a continuous time parameter 
and a discrete state-space. The analogous result for discrete-time processes will easily be 
seen. We recall some points about stationary Markov transition matrices, which are matrices 
whose elements p;,(t) satisfy the conditions 


Pi(t) 2 9, 
UPrs(") = 1, 


Pixl +t) = 2 Pis(8) Pil, 
for s,t>0 and for i, j,k = 0,1, 2,.... 

If such a matrix is given, together with a probability distribution p;, where 7 = 0, 1, 2, ..., 
which we call the initial probability distribution, a measure is defined on a suitable funda- 
mental probability space Q. The space Q can be taken as the space of all real valued func- 
tions of t, or as the space of all non-negative integral-valued step functions of ¢, where ¢ > 0. 
It is known (see, for example, Doob, 1953) that almost all sample functions of such a process 
are of the latter type, so, without loss of generality, we will adopt the more restricted 
fundamental probability space. We shall occasionally refer to the state-space which con- 
sists of the non-negative integers. 

For such a process it is also known that if p,,(t) > 6;; as t)0, then the limits 7; = lim p;;(t) 
exist. ahi 

In terms of these limits we establish the following classification of the states. If 7,,>0 
then state j is called positive, otherwise it is called dissipative. The positive states are those 
which are recurrent with finite mean recurrence time; the dissipative states include the 
transient states and the states which are recurrent and null. 

The positive states are further subdivided into disjoint positive classes C?, for 
p = 0,1,2,..., where j and k belong to the same C? if and only if 7, > 0. These classes are 
closed, i.e. if the system enters a particular C? it cannot subsequently leave it. 


There are a set of numbers (i, C’) defined for each state i, and for each positive class C?, 
such that 0<a(i,0*)<1, 
1 if «eC, 
w(t, C?) = Boot 
10 if t€C’, where o+p, 


ES p,lt)o(j,0°) = w(i,0°) for t>0. 
j=0 


These numbers w(i, C?) may be described as the probabilities that the system, starting 
from state 7 will enter, and thereafter remain, in the class C’. However, we cannot adopt this 
description until the event to which it refers has been defined as a measurable set in the 
sample space. 


3. MEASURABILITY OF THE CONDITIONING HYPOTHESIS 


Suppose (Q,.F, P) is the probability space, where Q has already been described, F is the 
Borel field of measurable subsets of Q, and P is the probability measure generated by the 
Markov process. Let 2,(w) be the co-ordinate random variable at time ¢. Then the set 


X = {w: x(w)€C? for all sufficiently large ¢} 





neter 
ly be 
trices 


ting 
this 
the 


the 
the 








W. A. O'N. WavucH 243 


will not be F -measurable, because membership of it imposes restrictions on x, for more than 
countably many values of ¢. To define a measurable set which can be identified with ‘absorp- 
tion in one of the positive classes’ we have to modify the fundamental probability space. 
We require some results concerning sets which are thick in (Q,F,P), which we shall 
summarize here. Details may be found in Halmos (1950). Consider any measure space 
(Q, F, ~) and denote inner measures by y,,. A subset Q, of Q is defined as being thick if 


fy (# — Qo) — 0 


for every measurable set H. It follows that if Q itself is measurable, which is so in a pro- 
bability space, then Q, is thick if and only if w,,(Q_—Q,) = 0. 

The theorem about thick sets which we use is the following. If Q, is a thick subset of 
a measure space (Q, F, 1), if F, is the Borel field of fall sets of the form EH mn Q), where Fe F, 
and if a measure of such sets is defined by /4)(H n Q,) = “(Z), then (Qo, Fp, 49) is a measure 
space. Let 


Q, = {w: for all s,t>0, and for each p, 2,(w) € C?=2x,,,(w) €C?}. 


Consider any set # (€.F) such that #- Q—Q,. EH must consist entirely of step functions 
ow which take a value in some C? at time s and take a value outside that C? at some later time, 
it being understood that any w which never takes a value in any C? belongs to Q). Since 
transitions out of C? are impossible, P(#) = 0 whence P,,(Q—Q,) = 0 and so Q, is thick. 

Applying the theorem we can make Q, into a measure space which is a probability space 
since P,(Q,) = 1. Now let X, = X n Q,. Then 


X,={w: x,(w) €C? for sufficiently large s} 
a {w: for each p and for s,¢> 0, x,(w)€C’=,,,(w) €C?} 
= {w: x,(w) €C? for some integer r> 1} mn Qy 
= YnQ,, say, where Ye F. 


Thus X,¢ A, (i.e. is Pj-measurable) and we identify this event with ‘absorption in C?’. 
The argument is only changed in detail if the event is to be ‘absorption in either C” or C2’ 
or more generally in any collection of the classes C?. 

The modification that has been made in the fundamental probability space can be simply 
described: from the original space we have removed all points which represent the system 
as entering and subsequently leaving any of the closed sets. From now on we shall suppose 
that this modification has been made, but for simplicity we will drop the suffixes and denote 
the space by (Q,.F, P). 


4. CONSTRUCTION OF THE CONDITIONAL TRANSITION MATRIX 


We shall confine our attention to just one of the positive classes, say C, and we shall write 
for the probability of absorption in C, starting from state 7 


u, = w(t, C). 


We shall refer to absorption in C as the event A. As before, the generalization to several of 
the positive classes is a matter of detail. 

Let F be the subset of the state-space consisting of all states which have some state of C 
as a consequent. In other words, F consists of all the states of C together with those dissi- 
pative states from which it is possible to reach some state of C. 


16-2 








244 ; Conditioned Markov processes 
Suppose that, among the initial probabilities, p; > 0 for at least one ie F. Then 


H(A) = ¥ pitts> 0. 
Define a matrix and a set of initial probabilities as follows 


(u;/U;) p(t) (veF), 


Pilt) = 
Dis(t) (¢¢F), 
j=0 
whence p,=9 for i¢F. 


This matrix is stochastic because the absorption probabilities satisfy the relation 
i 2] 
u, = DY u;p,;(t) for all #>0. 
j=0 


Thus, the matrix is a stationary Markov transition matrix and, together with the initial 
probabilities ;, it defines a Markov process and a measure on the space 2 which we shall 
denote by P. It remains to show that for any measurable set <Q, 


P{E | A} = P{F}. 
Let 0 = t)<t,<...<t, be a finite set of parameter-values, and let ay, a,,...,a, be a set 
of integers in the state-space. Define a set S,,< Q by 
S,, = {w: %,(w) = dp, ..., % (0) = Ay}. 


Then the measurable, finite dimensional subsets of Q are all of the form of S,,, and if the 
two measures agree on these subsets they are identical. There are two cases, according as 


there is or is not some a; among dp, ...,a@,, which belongs to the complement of F. If a;¢F 
for j = 0,...,1,” then 


P{S,} M9 Pay Pay a, (ey oe to) sae | 1 alts yj tn—1) 


Pa Ma, U U 
= en Me — Payal yaa to) eee — Pan sale es tn—1) 
BPMs Way Yan -1 





1 
= ply Pav Payas(ty — to) ++» Pan_ran(tn — bn—1) Van 
= PXS,, A}/P{A} 
= P{S,, | A}. 
If, on the other hand, a,;¢ F for some j = 0, 1,...,”, then it can be shown that 
P{S,} = P{S, | A} = 0. 


Thus in both cases the two measures agree on the finite dimensional subsets, so the result 
is proved. 





giv 


un 


Litial 
shall 


a set 


f the 
1g as 
ye FP 


sult 





W. A. O'N. WauGu 245 


5. THE PROBLEM IN TERMS OF INSTANTANEOUS TRANSITION PROBABILITIES 


A Markov process is frequently defined not by giving the matrix [p,,(t)] explicitly, but by 
giving a matrix Q of constants g,; which satisfy the conditions 


0<q;<00, when t+j for i,j = 0,1,2,..., 
02942 —2, 
2 Vij < — Ve: 
+t 


Much attention has been given to the problem of determining further conditions on Q 
under which Kolmogoroff’s differential equations 


Pixlt) = 2 Pislt) dns (5-1) 
j= 
Pix(t) = 2 Lis Psult), (5-1a) 
j= 
with the initial conditions lim pj,(t) = p;,(0) = dx, 
t>0 


possess a unique solution which defines a Markov process. We shall suppose that such con- 
ditions are fulfilled, in that Q is conservative and regular. The conditions for Q to be con- 
servative are that all the q;; are finite, and that }q;; = 0 for all 7. For conditions ensuring 


I 
regularity we refer to Feller (1940) and Kato (1954). A method of calculating the absorption 
probabilities u; directly from the matrix Q has been given by Kendall & Reuter (1957). 
Our problem is, given a matrix [q,;;] which generates [p;;(¢)] and using these u,, to define a 
matrix [9;;] which generates [ #;,(¢)]. Almost obviously the definition is 


i -e (te F), 
iy (i¢ F). 


We merely need to verify that Kolmogoroff’s forward equations (5-1), with these ,; 
introduced, are satisfied by the p;,(t) defined in § 4, i.e. that 


Piclt) = X Ps OGet dX Vislt) Ue- (5-2) 
jeF j¢F 
First, suppose that ie F’. Then the terms for j ¢ F are identically zero and so (5-2) is 
Pixlt) = X Dis) Ge (5:3) 
j¢F 


On substituting for ,,(t) and J, this becomes 


Uk , = % sd a 
we Pall) = Be Ball 
Provided that ke F this is equivalent to 


Dixlt) = SX Dist) Un 
jeF 


= > Dis(t) Wns (5-4) 
j=0 








246 | Conditioned Markov processes 


replacing in the sum the terms for j ¢ F’, all of which are identically zero. It will be seen that 
(5-4) is the forward equation for the original process, and therefore must hold. On the 
other hand if k¢ F both sides of (5-3) and so of (5-2) are identically zero. 

Secondly suppose that i¢ F. By similar methods (5-2) can be shown to be equivalent to 
(5-1) for the original process if k¢ F', and to reduce to zero on both sides otherwise. 


6. APPLICATIONS AND EXTENSIONS OF THE RESULT 
(a) The simple birth and death process 


We first apply the method to the well-known elementary birth and death process (see, for 
example, Feller 1950, p. 374) where the birth-rate per head per unit of time is A, and the 
death-rate is 1. The solution is similar to that for the random walk described in § 1. When 
A> extinction of the population has a probability less than 1, and to obtain probabilities 
conditional on the hypothesis of extinction it is merely necessary to interchange A and pu 
in any formulae derived for the ‘unconditioned’ process. To verify this, note that the 
matrix of instantaneous transition probabilities has elements defined for n > 1 by 


Qn, n-1 = nh, 
Inn = —n(A +/), 
Un, n+1 a na, 
and Ynj = 9, Whenever j7+n—1,norn+1. 
It is known that the probabilities of absorption at zero, or extinction, are given by 
u; = (u/A) for j = 1, 2,... and hence, applying the result of § 5, 
Gn,n-1 = (u/Ay" (u/A)- ny 
= na, 
with similar results for the other q;;. 


(b) The general Markovian birth and death process 


The general Markovian birth and death process can be specified by the matrix whose 
elements are 


In,n—1 = Pn» 
Inn = —(An+Hn)s 
In,nti = An; 
and Ynj = 9, Whenever j+n—1, n or n+l. 
Consider the series T=1+ > ns Ea J 
gad Ag os 


Kendall & Reuter (1957) have shown that the probability of extinction is 1 if 7’ = 0, 
and that if 7’ <0o the extinction probabilities are 


Up = 1, 





SAMs 
“"9 27.4 9% 


« 








Not 


an 


rel 














W. A. ON. Waveu 247 
that is Sees Pe 521 
aio a te Dig mee et ech 
nt t The relevant values in forming the modified process are u,,,/u,, and u,_,/u,, and we can 
. write these in a convenient form by defining the series 
++ Aj4s—1 
T; = 1+ ye a, 
ne re —1 
Note that 7, = 7’. We shall make use of the recurrence relation 
e, for 
d the 220, =T,.-1, where j>2. 
Vhen ite 
lities In terms of these series we obtain 
ind pu 
t the eid” tase 
Uy Ths 
and Un-1 = Tq, 
Un sl ai 
We can now define a new set of parameters \,, and 71,, which give rise to an instantaneous 
transition matrix whose elements are 
‘ : Gi; - (u,;/u;) Vijz> 
n by as required by our construction. 
x Th44—-1 
We put A, = ~t— A, 
Tt 
a Tn+2 Pn+t r 
Th41 Anu 
and Ln = f —1 Pn 
rhose _ In 
me. APs 
In each of these the second form is obtained from the first by means of the recurrence 
relation. Since g,, ,,, = A, it is clear that 
Gn,n+1 = (Un+1/Un) In, n+1 and similarly da.n-2 = (Up—1/Un) In,n-1" 
To verify that g,,, = nn We require that \,,+ 7, =A,+/,. Now 
x A, (u 
Kat fin = get (G2 Tse + Ta] 
n +n - (Fess n+2 +4, 
= 0, 
a *- Oe +1 1+ T,,) 
Ta 
T,,—-1 
oi + A, 
. Tht 
=A,+yu 











248 ; Conditioned Markov processes 


Thus the parameters X,,, 7, for n = 1, 2, ..., give rise to a birth and death process whose 
probabilities agree with the conditional probabilities for the original process, on the 
hypothesis of ultimate extinction. 


(c) The discrete-time chain reaction 


We now consider the process which appears in the Galton—Watson problem of the 
extinction of surnames, in the nuclear chain reaction, and in other contexts. It is described 
by Feller (1950) so full details will not be given here. A particle, or individual, in the nth 
generation has probabilities {q,: k = 0,1, ...} to give rise to k particles in the n+ lst gen- 


eration, and particles reproduce independently of one another. If } kq,> 1 extinction has 
k=0 


a probability less than 1, and we suppose that this condition is fulfilled. 
Let Qs) = ¥ a8". 
k=0 


Then the probability of extinction, when the population starts with just one individual, is 
given by that root of 
C= Q(6) 


which is less than 1. Such a root exists and is unique. If there are j particles in the population 
at a given generation the probability of subsequent extinction is w; = €/. 

In this problem it is of interest to find a distribution {@,,} for numbers of progeny of an 
individual, which will give rise to the modified transition matrix #;; (for discrete time) 
and hence to the conditional probability measure. Such a distribution is defined by 

he = Sq, (k= 90,1,...). 


Note that the generating function of this is given by 


~ 


Q(s)=C7Q(Es), 


whence @Q(1) = 1. In view of the values of the extinction probabilities w; we must show, 
for the elements of the transition matrices, that 


By = Opi. 
Now ;; is the coefficient of s/ in [Q(s)]*, while j;; is the coefficient of s/ in 
[Q(s)l' = [g7Q(Es)I, 


whence the required relation follows. 


The mean number of progeny for the distribution {7,} is given by Q’(1) = Q’(¢), and it 
follows from Feller’s discussion that this is strictly less than 1. 


Yaglom (1947), Hawkins & Ulam (1944), and Otter (1949), have given theorems for the 


chain reaction when > kq,<1, some of which are conveniently summarized by Harris 
k=0 
(1951). Our preceding remark shows that those theorems which apply when > kq, <1 can 
k=0 


© 
be used to obtain probabilities, conditional on extinction, when > kq,,> 1. 
k=0 





vhose 
1 the 


f the 
ribed 
e nth 
; gen- 


n has 


1al, is 


lation 


of an 
time) 


how, 


nd it 


r the 


[arris 


1 can 





W. A. ON. WaucH 249 


(d) Further remarks on the theory and applications 


The ‘interchange of parameters’ result (a) for the simple birth and death process has 
been applied by Kendall (1956) to the study of the ‘threshold theorem’ for epidemics. It 
seems appropriate to mention here that the general problem arose out of an investigation 
of the special case of the birth and death process, and to thank Mr D. G. Kendall for his 
suggestion of the original problem and for a great deal of other valuable advice. Dr G. S. 
Watson suggested the application (c) to the chain reaction which arose in a current in- 
vestigation of a theory of the size of large molecules. 

The simple birth and death process described in (a) does not reflect in any way the lives 
of individual particles, and models involving some space of ‘trees’ are more suitable when 
problems like that of the age distribution are considered. Details of such a model will be 
given elsewhere but it may be mentioned that the ‘interchange of parameters’ result holds 
for the age distribution. 


REFERENCES 


Doos, J. L. (1953). Stochastic Processes. New York: Wiley and Sons; London: Chapman and Hall. 

FELLER, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. 
Trans. Amer. Math. Soc. 48, 488-515. See also corrigendum in 58, 47/4. 

FELLER, W. (1950). An Introduction to Probability Theory and its Applicatioris. New York: Wiley 
and Sons. 

Hatmos, P. R. (1950). Measure Theory. New York: Van Nostrand. 

Harris, T. E. (1951). Some mathematical models for branching processes. Proceedings of the Second 
Berkeley Symposium on Mathematical Statistics and Probability, pp. 305-27. 

Hawkins, D. & Utam, S. (1944). Theory of multiplicative processes, I. Los Alamos declassified 
document 265. 

Kato, T. (1954). On the semi-groups generated by Kolmogoroff’s differential equations. J. Math. Soc. 
Japan, 6, 1-15. 
KENDALL, D. G. (1956). Deterministic and stochastic epidemics in closed populations. Proceedings 
of the Third Berkeley Symposium on Mathematical Statistics and Probability. IV. Pp. 149-66. 
Kenpatu, D. G. & Reuter, G. E. H. (1957). The calculation of the ergodic projection for Markov 
chains and processes with a countable infinity of states. Acta math., 97, 103-44. 

Orrer, R. (1949). The multiplicative process. Ann. Math. Statist. 20, 206-24. 

Yaactom, A. M. (1947). Certain limit theorems of the theory of branching random processes. (Russian.) 
Dokl. Akad. Nauk. SSSR. (N.S.), 56, 795-8. 








[ 250 ] 


MISCELLANEA 


Ranking means of two normal populations with unknown variances 


By RITA MAURICE 
University College London 


If, on the basis of a single sample from each population, it is desired to rank the means of two normal 
populations with a given probability, P, of a correct ranking when the difference between the means is 
6, knowledge of the populsition variances is required to determine the size of the samples on which the 
ranking is to be based. Stein (1945) put forward a two-sample procedure for testing a value of the popula- 
tion mean when the population variance is unknown, an initial sample of size n, being used to estimate 
the variance and hence to determine the size, v2, of the second stage of sampling. This type of procedure 
was adopted by Bechhofer, Dunnett & Sobel (1954) for ranking means of normal populations when the 
variances are in known ratio. For the special case of two populations, similar procedures may be used 
when the variance ratio also is unknown and nothing is known about the variances, o?, 072. 

The problem consists in determining the value n, so that the correct selection is made with probability 
P (or greater) when the difference between the population means is ~—y’ = d>0. We shall write 
Xj, xj (i = 1,2,...,1,...,N, +N) for the sampled observations from the two populations; %,, %j for the 
means of the observations in the first samples of n,, %, %’ for the means of the combined n = n, +n, 
observations. 

The first procedure considered consists in (i) taking a sample of n, from each population; (ii) using 
the samples to estimate 0? + 0”; (iii) from this estimate determining a value n,; (iv) drawing a second 
sample of n, from each population; (v) choosing as the population with the larger mean that for which 
the total sample of n, +, observations has the greater mean. 

An estimate s*, of ¢?+ 0’? may be made by pairing at random observations from each sample, as 
suggested by Bartlett and mentioned by Welch (1937, p. 360) for the problem of testing the difference 
between two means. Thus we may take 


us 
= SY (x,—2j{—%,+24)?/(n,— 1). 
i=1 


m% 
Alternatively we could take = PY (a, +2} —-%,—%)?/(n,—1). 


i=1 


Writing h for the percentage point of the distribution of Student’s ¢ having n,—1 degrees of freedom 
which satisfies Pr {¢,,_, > —h} = P, n, is determined from the relation 


N = 2, +N, = max {ny, [s*h?/d?] + 1}, 


where [s*h?/d*] indicates the largest integer less than s*h?/d?. Then if n, is the size of the second sample 
to be drawn from each population, a ranking of the population means corresponding to the ranking of 
the sample means %, %’ satisfies the required probability condition. That this is the case may be seen as 
follows. 

The probability of the population with the larger mean having also the larger sample mean is 


8/./n 
= Pr{t, _.> —d./n/s}. 


But since n has been determined so that 3./n/s>h, it follows that the actual probability is greater than 
(or equal to) the specified value, P. 

Another possibility is to take initial samples of size n, from each population and to make from these 
separate estimates s*, s’? of o*, o’?. This was considered by Chapman (1950) for testing the ratio p/p’. 
We then take further samples of size n,, ng, respectively, from the two populations such that 


n =n, +N, = max {n,, [s*h?/d*] +1}; n’ =n, +ng = max {n,, [s’*h?/d?] + 1}. 





Fi 


‘mal 
ns is 
. the 
ula- 
nate 
lure 
the 
ised 


lity 
Tite 
the 
+n, 


sing 
ond 
1ich 


, as 
nce 








Miscellanea 251 
For this procedure the chance of a correct choice is given by 
1 rr 
Pr{e—p—-2'4+p’>—-—d= Tam} ie 
where 7' = d/{o?/n + 0’2/n’}*. This integral of the normal function is an increasing function of n and n’. 


Thus, a lower limit to the value of the probability of a correct choice when 4— yp’ = 6 may be found by 
regarding the n’s as continuous variables and putting each equal to its minimum possible value, i.e. 


Vm = sh/6, Jnz = 8*h/6. 


120F 


80F 








Single sample, known variances 
60 


a 


20r 


Sum of expected total samples, E(2n) or E(n +n’) 





j ! i 1 i 
20 40 60 80 100 





Sum of initial samples, 2n, 
Fig. 1. Expected total sample sizes, P = 0-95, o* = 0”, 6 = 0-40, —-—., Ratio of variances known; 
, random pairing of sample values; -———, separate estimation of o*, o”*. 





Thus, the actual probability is greater than (or equal to) 


Pr (Fae Cae, -1\. 
8 8 


The left-hand side of this inequality is distributed as the difference between two independent ¢ variates 
with n,—1 degrees of freedom, and h is therefore the percentage point of this distribution and not of 
Student’s t. Percentage points of the difference of two independent weighted ¢ variates have been 
calculated by Sukhatme (1938) and his values for a = 45° may be adapted for use here. Additional values 
for small numbers of degrees of freedom have been prepared by Fisher & Healy (1956). 

Expected sample sizes for these two procedures have been calculated using the expressions derived 
by Seelbinder (1953) which assume that, if a second sample is required, n = s*h®/d?, and ignore the dis- 
continuity of n,. Some of Seelbinder’s tabulated values for n have also been used. In Figs. 1 and 2 the 
results are shown together with the results of similar calculations for the procedure of Bechhofer et al. 
(1954) when the variances are in known ratio. In one case the unknown variances are equal and in the 
other in the ratio 3:1. 

The figures show that the expected sample sizes are not much increased by ignorance of the variance 
ratio. This is presumably because the probability of a correct choice, given 3, depends on the sampling 
variance of the difference of the sample means, o?/n + 0’?/n’. For single samples of a given combined 





252 Miscellanea 


size n+n’ = N, say, this variance is the same whether n= n’ = $N or n/n’ = o*/o’?. Thus, the only 
advantage of the Bechhofer, Dunnett and Sobel procedure (in which n/n’ = o?/o’?) over the random 
pairing of the sample values (in which n = n’) lies in the greater number of degrees of freedom in esti- 
mating o* and determining h. This advantage is greatest when n, is small. Random pairing of sample 
values gives better results than considering each population separately especially when the variance 
ratio is large. However, the latter procedure has the advantage of extension to the case when more than 
two populations are being considered. 


Ny 

So 

So 
1 





a ¢ Single sample, 
* meer ee known variances 








Single sample, 
optimum allocation 


Sum of expected total samples, E(2n) or E(n+n’) 
8 
T 











os 
80 07 
40 — 
I L aah ! ! J. 
40 80 120 160 200 240 
Sum of initial samples, 2n, 
Fig. 2. Expected total sample sizes, P = 0-95, 30? = 0”, 6 = 0-40. —-. —., Ratio of variances known; 
, random pairing of sample values; -———-, separate estimation of o?, o”?. 
REFERENCES 


Brcuuorer, R. E., Dunnett, C. W. & SOBEL, M. (1954). A two-sample multiple-decision procedure 
for ranking means of normal populations with a common unknown variance. Biometrika, 41, 170. 

CHAPMAN, D. G. (1950). Some two-sample tests. Ann. Math. Statist. 21, 601. 

Fisuer, R. A. & Heaty, M. J. R. (1956). New tables of Behrens’ test of significance. J.R. Statist. 
Soc. B, 18, 212. 

SEELBINDER, B. M. (1953). On Stein’s two-stage sampling scheme. Ann. Math. Statist. 24, 640. 

Str, C. (1945). A two-sample test for a linear hypothesis whose power is independent of the variance. 
Ann. Math. Statist. 16, 253. 

SuKHATME, P. V. (1938). On Fisher and Behrens’ test of significance for the difference in means of 
two normal samples. Sankhya, 4, 39. 

We cn, B. L. (1937). The significance of the difference between two means when the population 
variances are unequal. Biometrika, 29, 350. 








and 





only 
1dom 
esti- 
mple 
iance 
than 


wn; 





Miscellanea 253 


Non-randomness in a sequence of two alternatives 


II. Runs TEstT 


By D. E. BARTON anp F. N. DAVID 
University College London 


It is assumed that there exists an infinite population composed of two characteristics in the proportions 
pand qg. Asample ofr is randomly drawn from this population, the elements being laid in a line according 
to the order of drawing. Under the null hypothesis the alternation of the two characteristics along the 
line will be random. Under the alternate hypothesis it will be supposed that the selector is influenced in 
the choice of the (v+ 1)st element by his knowledge of the vth element. Such a situation might arise if, 
confronted with a very large number of photographs of men and women and asked to pick out r in 
ascending age order, the selector tended to choose a photograph of a women if the one previously selected 
was also a woman. We have previously described this situation as one of persistence of type. 
Let the suffix 7 denote the ith ranking position and denote the two characteristics by « and y. Then 


P{x} = p = 1-q=1-Pfy} (i= 1,2...). 


If in r drawings there are 7, x’s and r, y’s the probability distribution of 7’, the number of runs of both 
alternatives, is, under the null hypothesis 





n-10, ,%-10,_, pg" 2n-1C, 2-10, 
P{T =2t|1,,17,,H,} = 2 — 1 t1P "9 = t-1 t 1 
CP "Cr, 


and 
m1-1 3 T2—1 n-10. m—-1C0) = Tints n-10) re m—-10) n-10, m-10, Be 
P{T =2t+41| ry Ho} = oP ii t t—1) PQ”? _ t-1 et t 1 
*C,, y) 971 q” "C,, 


This result is possibly due to Whitworth (Choice and Chance, Problems 193 and 194) but was probably 
known before him. 
Under the alternate hypothesis we shall assume that persistence of type can be described by a simple 
Markoff chain. We suppose the probabilities for the single event are unaltered but that 
Pa; |x 43=p+q0, Ply;|x4} = (1-9), 
P{x;| ys} = P(1-9), Ply:|\yats=qt+p) (i = 2,3,...). 
It will be noticed that Pa; | x4} + Py; |e} = 1 
and Pfor,_s} Pfr | 21-4} + Ply} Pla | ya} = P 
as expected. Under these assumptions the distribution of JT in the non-null case, i.e. when 6 +0, is the 
equilibrium position of that described by Daviri (1947). We have, writing 


y [ ‘n-1Q,_ 2G, [* +) t+ (1-9) {rpg + O(rp* + r(4—P))} 











+99) (q+p9) t(1—0)(p+q0)(q+p0) 
that P{T =2t| 14,1, Hy} = 5a 4 —_ eee J 
and P{T =(2t+1)| 71,72, Hy} = 5101 10,-1 otghigiph 
x [ 2a +0) +04 PI t PP) erig—?) “| ee a 


Power curves for varying p and @ have already been discussed by David (1947). When @ = 0 the dis- 
tribution reduces to that for Hy; the bounds for @ are + 1. When @ is positive the critical region will be 
the lower tail of the distribution under H,; when it is negative the upper tail will be appropriate. 

The complete description of any sequence is given by the number of elements of either kind and the 
two compositions of 7, x’s and r, y’s, together with the information as to the nature of the first obser- 
vation. When 7, and r, are decomposed into the same number of components (say ¢) then 7' = 2¢ is the 
number of runs. When the number of components differs by 1 the total number of runs is odd. Let us 
specify the two compositions by (a,, ...,@,), (b;, -.-, 6,), where 


r,= Xa, ,= 2b; and |l—k|<1. 








254 Miscellanea 


Let « be a characteristic random variable taking the value 1 if the sequence starts with an x and zero 
otherwise. Various functions of the compositions have been used as test functions for randomness in 
the sequence, generally conditional on r, and ry. 

Now the probability of any sequence of events (each of which must be either x or y) of specification 
[r'1,7e, {a;}, {6;}, 2], under the null hypothesis of a sequence of r independent trials is pg" and is in- 
dependent of the other variables in the specification. This probability is (4)" if p = g. Under the hypo- 
thesis H, for persistence of type, but with the additional specification that p = 4, the probability of 


any given sequence is (4) (1 —O)F +21 + O)T-F, (1) 


that is to say it depends just on 0,7, and k+1 (the total number of components of r, and r,). We have 
called T’ = k+1in the preceding paragraph. It follows that under these circumstances, for a fixed sample 
size r, we have that T' is sufficient for @. 

We note that, summing the probabilities of each sequence over all sequences of the same value of T, 
and writing 71,(z) to denote the probability generating function of T under the null hypothesis 


1-—6\T 1-06 
P(T; 4) = (53) P(T; Hts)[m(7=5) . 


It will also be noted that this result is true if we consider the distribution conditional on 7, and r, since 
these do not enter into (1), and we may therefore consider a subset of the sample space to obtain 


P(T | ruse: H,) = (1=2)" Per ;H at 

(T | 11,723 Hy) = (3) (T | 11,795 om 5 rut) 
where 7,(z | 71,72) is the null hypothesis probability generating function of T' conditional on 7, and r,. 
Assuming @ pqsitive, values of the power function for two different sequences and three different values 
of @ are given in Table 1. The critical region was made exactly 0-05 in each case, for purposes of com- 
parison, by taking a proportion of a frequency block. 





Table 1. Powers of T, S and b under the alternate hypothesis of dependence 

















r r, rs 0 1/4 3/5 7/9 | 0 
10 5 5 T 0-178 0-587 | 0-842 | 0-05 
Ss 0-126 0-363 | 0-593 | = — 
b 0-055 0-175 | 0-288 | es 
ee, A ——— | | 
teow 4 T | #0178 | 0-587 | 0-842 0-05 
| S 0-124 0-362 | 0-593 — 
| 





For sequences of reasonable length and r, and r, not very different it is possible to assume normality 
for T under H,. The distribution of under H, may, for values of @ not very different from 0, also be 
assumed normal, but the mean and variance will be different. The moments of 7 under the alternate 
hypothesis, when p = 4, can be found by using the same device as in our earlier paper (Barton, David 
& Mallows (1958)). Let x,(@) and x, denote the vth cumulants under the alternate and the null hypothesis 


respectively. If 8 = log[(1—0)/(1+9)] 


& 63 
then K,(0) = Kyt 8. Kort > Korat 3 Kosa t 
and in particular K,(0) =k, +0. Ky +407. Ky +403. Ky, 


K(0) =k, +0.k, +40. Ky, 
K,(0) =k,+6.K,, 
K,(0) = ky. 


Since the distribution of T under H, is quickly normal with increasing r so that x, (v > 2) tends reasonably 
quickly to zero these expansions should be adequate as regards order for 9 small. The factorial moments 








| 


{ 


of T 
the « 


Wh 


a 2 4. © = Oo 





d zero 
ess in 


cation 


since 


nd 7. 
alues 
com- 


ali ty 
30 be 
nate 
avid 
hesis 





Misceiianea 255 


of T under the null hypothesis are of reasonably succinct algebraic form (Barton & David (1957)), but 
the central moments and cumulants do not appear to reduce easily. The first four cumulants under H, are 


























_ 2ryPe 2r, 7. (27172 
gs eae ated, Kg = 72) = ’ 
2r,r, (16rire 4r,7r,(7 +3) 
: > =e os +3r}, 
rh r r 
_ 2ryr, (48(5r—6) riz 48(2r2 + 8r— 6) rire 
"a as ae 
2(4r3 + 4572 —37r—18 
= ih an = > 3 (a8413r—6)). 
When r, = r, the above reduce to 
2 = —r(r— 2 
om ee r(r—2) ee aay r(r—2)(r ey 


oe @e=0Y 8(r— 1)? (r—3) 
a 1 
n=-( 7+ ea ec8): 


Table 2. Bivariate distribution of b and T for a sez.ie2ce (7, 3) 





























T | | | 

| 5 eee, 3 4 | gS odpeougge? Bod tay ob poem 
okertyn + TITS We BA: sarees Ay | | 

0 1 2 4 2 | 1 | -. 4 10 

1 a 2 8 a eee 50 

2 see 2 8 mane ee | 10 | 50 

3 1 2 4 2 | es 10 

pele ie — | —_ 
Total 2 Se ae 36 | eS ae aR 
| | | on 








For 6 < 0-6 a satisfactory agreement was found between the true mean and variance of T' (as found from 
the calculated probability distribution function) and from the series expansion or the cumulants. This is 
not, however, true for the x,(4) and the x,(@) of 7’, and it is clear that for the series expansion to be useful 
in these two cases higher moments of T in the null case must be calculated. To cut the series short, as 
has been done in the previous section, will be adequate for x,(@} and k,(@) only when @ is, very approxi- 
mately, <0-2. In this latter case the distribution of 7’, will be approximated to by the normal distribu- 
tion. For large values of 0 the distribution of T is J-shaped and in order to approximate to it by (say) 
a Pearson curve, the series expansions for k,(9) and x,(@) will need to be extended. 

In our previous paper the powers of S, the sum of the ranks of one characteristic, of b, the number of 
one characteristic below the median of the sequence, and of 7’, were compared under the same alternate 
hypothesis. Using the same arguments as we set forward there, we may compare the powers of S and 7’, 
and of 6 and 7’, under the dependence alternative, using the bivariate distribution. The arrays of the 
distribution of S for T' fixed are weighted by [(1—0)/(1+9)]? and then added for T' keeping S fixed. 
The critical region for S under the null hypothesis is the sum of the two tail areas. The power of S to 
detect 6+ 0 is given in Table 1. 

The joint distribution of 7' and 6b may be written down in explicit algebraic form and the bivariate 
table constructed. For the sake of illustration we given the bivariate distribution for r, = 7 and r, = 3. 
When the total number in the sequence is even it is possible to make a dichotomy at the median and the 
table is symmetrical about &(). It follows that &(b | 7’) is constant. When r, the number in the sequence, 
is odd, we make a dichotomy between the Rth and the (R+1)st observations (R = }(r—1)), and the 
symmetry of the table disappears. The regression of 7' on b is quadratic under the null hypothesis and 
may be found either from the joint probability distribution function or from the following considera- 
tions. Let 7', be the number of runs of both characteristics below the point of dichotomy and 7’, the 
number above. Then T =T,+T,-4, 








256 é Miscellanea 


where « = 1 if the last element below the point of dichotomy and the first above are of like characteristics 
and « = 0 otherwise. The conditional expectations we write as 

&(T'| 6) = &(T,| 6) + &(T, | b) — E(a| b). 
It is immediate that 








$7, |6) = 14), aero) = 14 Ra 
b(r, —b) + (R—b) (r—R—r, +b) 
4(a\s) = — os ‘ 
whence, on substitution, 
é(T'|b) = 1 + ry a (RDN 1) —b(r —2R + 2r,(2R —1)) + Rr,(2r,—1)}. 


The maximum value of &(T' | b) will be when 


1 
b = ———- (r—2R+2r,(2R-1)), 
ar—1)"” + 2r,( )) 
which for a median dichotomy of an even number of observations reduces to b = }r,. The regression of 
T on 6 under the alternate hypothesis is approximately, following the argument already set out, 


&(T | b, Hy) + &(T | b, Hy) + 8.07) x(Ho), 


and may therefore be calculated once 07, ,(H,) is found. This second moment will be of the fourth power 
in b, but since we do not need the regression under H, we have not calculated it. 

The power of b under the alternate hypothesis can be found in precisely the same way as we have put 
forward for calculating the power of S. Because b can take few values in a short sequence the power has 
been found for a (5?) sequence only. The critical region, the sum of the two tail areas of the b-distribution, 
has been forced to be 0-05. It will be noticed that b appears to be of little value to detect dependence 
in a sequence. For the moments of 6 under the alternate hypothesis we have not been able to find simple 
expressions. Since the denominator of the general expressions is the probability density function of the 
number of events in a Markoff chain of two alternatives, and this has not yet been found expressible in 
terms of elementary functions, it seems unlikely that the moments of b will be tractable. 

In a previous paper we put forward an alternate ranking hypothesis for which S was a sufficient test 
statistic. In this paper we have put forward an alternate hypothesis against which T' is a sufficient 
statistic. The powers of S, 7’ and b have been compared for each model. 


REFERENCES 


Barton, D. E. & Davin, F. N. (1957). Biometrika, 44, 168. 
Barton, D. E., Davin, F. N. & Matiows, C. L. (1958). Biometrika, 45, 166. 
Davin, F. N. (1947). Biometrika, 34, 335. 


Note on multiple comparisons for adjusted means in the analysis of covariance 


By MAX HALPERIN* anv S. W. GREENHOUSEt 
National Institutes of Health 


1. INTRODUCTION 


The analysis of covariance, in the simple application to a one-way classification, deals with the problem 
of comparing k-class means of a variable y in the presence of a covariate x. If we observe 


(Yir» Vir), (Yeas Xia), +299 (Ying Xin,) 


in the ith class, the usual assumption (using & to denote expected value of) is that Cy yz = a, + ba,; as 
opposed to the customary situation in the analysis of variance where &y,; = a;. Since the mean of the 


* Division of Biologics Standards. 
t+ National Institute of Mental Health. 








m 


fo 


K 


eristics 


sion of 


power 


ve put 
er has 
ution, 
dence 
imple 
of the 
ble in 


it test 
icient 


blem 


4g AS 
F the 





Miscellanea 257 


ith class ¥;, has the expectation a;+bz;_, then clearly, if %;,+%,, €y;, differs from &Y,;, even if a; = aj. 
One therefore asks whether the k-class means are equal after adjusting all the observations to a common 
x, say Z__, the grand mean of the 2;;. 

This question is ordinarily answered by computing the appropriate F-test (Snedecor, 1956; Anderson 
& Bancroft, 1952). The arithmetic of this test is as follows. After calculating the least-squares estimate 
of b from within the k classes, assuming there are class differences, one computes the pooled deviations 


k 
from regression to yield an error mean square with >) n;—k-—1 degrees of freedom. One next computes 
i= 


k 
the deviations from a regression line fitted to all }) n; observations and subtracts from this the error sum 
i=1 
of squares previously computed. The difference so obtained leads to a mean square between the k classes 
with k—1 degrees of freedom. The ratio of this latter mean square to the error mean square is the 
computed F’. 

Now, it is usually the case that we are less interested in a test of homogeneity of the adjusted true 
means that in confidence intervals on contrasts among them. Snedecor, in the latest edition of his 
Statistical Methods (1956) suggests the usual confidence limit procedure for comparison of two groups; 
for comparison of pairs of adjusted means when there are more than two groups, a sequential test of 
pairs of adjusted means, discussed by Hartley (1955), is proposed. This latter procedure controls errors 
in the multiple comparison sense for testing all pairs of adjusted means. It seems worth mentioning in 
passing that the application of Hartley’s procedure in the comparison of adjusted means appears in- 
appropriate on two counts. In the first place, Hartley’s procedure assumes independence and homo- 
scedasticity of the means being compared; secondly, his procedure assumes equal sample sizes in the 
various classes being compared. The general situation in the comparison of adjusted means fails to meet 
either of these requirements. Thus, results of the use of the Hartley procedure in this application should 
be viewed with some reserve. 

The main purpose of this note is to point out the apparently unrecognized fact that Scheffé’s work 
(1953) on multiple comparisons is immediately applicable to testing and obtaining confidence intervals 
for contrasts among adjusted means. Use of the method is conservative if we are only interested in 
comparison of pairs of adjusted means; hovvever, it has the virtue of being precise in the probability 
sense. 

It is well known that, although one thinks of the test previously described as being on the set of k 


adjusted means, 7; = Y;,— 6(z,. —Z__), the numerator sum of squares reflecting the variation among the 
k 
k adjusted means is not identically equal to }) (y/—7,,)?, the sum of squared deviations of the adjusted 
i=1 


v 
means about their mean. This is so because the 7 have different variances and are correlated. Itis perhaps 
for this reason that the applicability of Scheffé’s multiple comparison theorem has not been obvious. 


2. APPLICABILITY OF SCHEFFE’S MULTIPLE COMPARISON THEORY 


For convenience in discussion we first state the theorem of Scheffé on multiple comparisons in a form 
suitable for this application. 


TueorEM. Let ji, /'2,-.., 4, have a multivariate normal distribution such that 
Opis = Mi (¢= 1,2,..:.%), 
cov (fii, fj) = ayo? (i,j = 1,2,...,k), 


where the constants, a,; are known and o? is unknown. Let &* be an estimate of a? distributed like 
o*y?/m with m degrees of freedom independently of ji, fig» -++s fay. Let the fi; and pu; be restricted by 


ke a 
Dike = Dhipi =h, 
i=1 i=1 
k k 
where the h, and h are known constants and >) h;+0. Then, if 0 = >) ¢;u; where the c, are arbitrary 
i=1 i=1 
k A 
except that >) c; = 0 (0 is a contrast) and the rank of the distribution of the j, is (k — 1) the probability 
i= 
is 1—a@ that the values 9, of all possible contrasts, simultaneously satisfy 
6-Q65<0<0+Q6%. 
17 Biom. 45 








258 f Miscellanea 


Here A <= 
A= »y Ci fi» 
i=1 
A2 ke 
cr] = -> a430;0;07, 
i,j=1 
and Q? = (k—1) F,(k-—1,m), 


where F’,(k — 1, m) is the upper 100 % point of the F’-distribution with (k— 1) and m degrees of freedom. 
Thus, we need only to verify the correspondence between the theorem and the problem at hand. To 
do this we need some notation. Thus, let 








y =Y.. 
ss km 
Sz = Z x (ys —4, — bes)? ’ 
i=1j=1 
G; = Vi —bz,, («= 1,2, k), 
k Nu 
Sore = »y p> (x5 — 2; ? 
i=1j=1 
Now we let hi =¥-y, 
AA 6; 1 X%,—-z t, —2z ) 
and have cov (fi; fy) = a = (%;, a (%;,—Z..) 0, 
v os wer 
nN. 
4 


where é,; = 1,1 = j, 6,, = 0,1 +); this identifies the a,,. S? estimates o? with In,—k—-1 degrees of freedom 
and is well known to be independent of 9, 73, ...,y%, and is thus identified with (In,—k—1)6*. The fi hi 
are restricted by Engfi; = = 0, so that n; = h,, ion = 0. It remains only to show that the rank of the co- 
variance matrix of the ZZ he = say, is k— 1. We have, aside from a constant multiplier, o?, 





1 4 7 
&=D-5, 31+ d’d, 


Swan 


where D is ak x k matrix with zeros off the diagonal and nj! as the ith diagonal element, j is a row vector 
of ones, d is a row vector with elements d; = %;,—%,,, and the prime symbol denotes a transpose matrix. 


We can write, 4 
== pf 1+ == nj], 
aan - Fe 


where I is the identity matrix and (nd)’ and n’ are column vectors with elements n,d; and n,, respectively; 
further factorization of the matrix leads to . 


o (nd)’ 
. s ofS S wes | [I-s = 


Since D(I + (nd)’ d/S,,,,) is non-singular, the rank of & is the rank of (I—n’j/n,) which is easily shown 
to be (k— 1). Thus all the conditions of Scheffé’s theorem are satisfied. 

It is also obvious that the case of multiple covariates can be treated similarly. The analysis given above 
also holds for randomized blocks with S,,,, replaced by 








k mn 
Sjez = Pi (ty —-%.—-Zst+Z__)*, 
and S¥ appropriately modified. 





The fc 
in whi 
and tl 


If: 
reduc 





fi 


O- 


or 


y; 





Miscellanea 259 


3. AN EXAMPLE 


The following example is taken from Snedecor (1956), pp. 404-6. It is a randomized block experiment 
in which the variable of interest (y) is the yield in pounds field weight of ear corn for six varieties of corn 
and the covariate (x) is ‘stand’. The relevant data are as follows, using the notation of § 2. 














| | 
Variety | nN: | Xi, vi | 
Pe. See ee eat 

1 4 24-00 191-8 

2 4 25-25 191-0 

3 4 26-50 193-1 

4 4 28-00 219-3 

5 4 27-75 189-6 

6 4 26-50 213-6 











If we consider only the analysis of paired comparisons between adjusted means the contrast variance 


reduces to ; 
1 (&,,—%,,) 
S? =} —+— se s2, 
ij [ he Stes Op 
j ; 1 (%.—3%;,)? 
th for th = (3 22. 
which for this case gives rf, (5+ oe be 97 


From the preceding discussion we may assert that with probability equal to or greater than 0-95 


Yi — Yj — 3-848; <6 (Yi — Yj) < Yi —Yj + 3-848, 
for all ¢ and j. 

Detailed calculation for the largest difference of the ¥j (variety 4 against variety 5) gives a value for 
3-848, of 26-8 so that variety 4 is significantly greater in yield than variety 5. A minimum value for 
3-84S;; is given by assuming %; = %,. This gives a value of 26-5 which indicates no further significant 
differences between pairs of yj except perhaps ¥4— Yj and ¥4—¥Y. Calculation of 3-84S;; for these two 
cases shows the differences are not significant. 

In contrast to this analysis, Snedecor, using the sequential testing of Hartley, asserts that varieties 4 
and 6 differ from the remaining varieties (whose yields are homogeneous) but not from each other. 
However, as earlier remarked, this conclusion should be viewed with reserve. 

Note. Somewhat the same problem is considered in part of a paper by Kramer (1957). The difference 
between Kramer’s paper and this note is that Kramer considers an extension of Duncan’s multiple range 
test, and thus is considering control of an error rate different from that controlled in the Scheffé pro- 
cedure, and in addition there appears to be no proof that for his proposals error control in the Duncan 
sense is either preserved or even that his proposals provile conservative tests. 


REFERENCES 


ANDERSON, R. L. & Bancrort, T. A. (1952). Statistical Theory in Research. New York: McGraw-Hill. 

Hartiry, H. O. (1955). Some recent developments in analysis of variance. Commun. Pure Appl. 
Math. 8, 47-72. 

Kramer, C. Y. (1957). Extension of multiple range tests to group correlated adjusted means. Bio- 
metrics, 13, 13-8. 

Scuerrs, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika, 40, 
87-104. 

SnEDECOR, G. W. (1956). Statistical Methods (5th ed.). Iowa: Iowa State College Press. 


17-2 








260 Miscellanea 


An empirical investigation into the distribution of the F-ratio in samples 
from two non-normal populations 


By H. R. B. HACK 
Glasshouse Crops Research Institute, Littlehampton 


1. DESCRIPTION OF THE DATA AND PROBLEM FOR INVESTIGATION 


As a preliminary to the study of the growth of tomato plants under glasshouse conditions a root excava- 
tion was carried out at Cheshunt Research Station in 1950 (Leonard, 1952). Root lengths in 10cm. 
cubes of soil were collected in eight horizons referred to as Depths I-VIII. Each depth formed a 10 x 8 
row and column layout. The values for Depths IV and VI, which are the two non-normal populations 
used in this empirical investigation, are given in Table 1. The frequency distributions are shown in Fig. 1, 




















Depth IV z Depth VI 
Mean = 28-6 Mean = 18-9 
s= 245 = 25- 
30k 30k s= 25-4 
8 g 
eT 3 of 
3 3 
‘6 20 ‘6 20 
F: F: 
g 5 L 
Zz = 
10 10F 
Ce Gad elas Ga ee sc a 
100 cm. 100 cm. 
Root length per cube Root length per cube 
Fig. 1 


Inspection of the data suggested that there were certain major trends as a function of the three dimen- 
sions of the solid excavation. Although it would appear perhaps unlikely that a root sample would have 
a value independent of the neighbouring samples, it seemed necessary, first, in terms of the row and 
column layout to test the null hypothesis that the spatial distribution is statistically indistinguishable 
from that of a pepulation in which the total root length of each member is the sum of a value which all 
cubes tend to attain and a randomly distributed error term. This might be attempted by the usual 
variance ratio tests or by a randomization procedure (Fisher, 1947). 

Referring to the assumption of normality underlying the use of statistical tests based on the ¢ and z 
distributions, Yule & Kendall (1947, p. 437) summarized a general view: 

Some experiments have been made to throw light on the question whether they are true for other types 
of universe. It appears that, provided the divergence of the parent from normality is not too great, the 
results. ..true for normal universes are true to a large extent for other universes. But the whole situation 
is obscure and it is to be hoped that in time investigators will be able to engage in the labour of a closer 
inquiry. 

Table 2 shows the degree of departure from normality measured by g, and g, and their standard 
errors (Fisher, 1950, § 19, C). 

Depth I has been subdivided into two blocks of forty cubes owing to heterogeneity. At Depth I in 
rows (1-5 only is g, less than its standard error. For the other half, rows (6-10), g, is approximately 





nn 


8 oF 





Miscellanea 


Table 1. Root length (cm.) per cube of soil 






































Depth IV 

Column | 
2 3 4 5 6 7 8 9 

Row | 
1 16 16 «30 «6©66206—6C88lhlU4t (mt Ch 
2 OR ea RS oe 
3 4 20 34 69 21 16 35 Ti 
4 0 7 15 2 Ss Se Se 
5 0 ‘-- ss 2 he 2 OS 
6 0 4 2 10 6 41 £462 «2111 
7 0 3 ee = of we oe 
8 15 is ws 6« # @ 
9 ae a a a ee ee 
10 | 10 4 7 5 41 19 35 49 

Depth VI 
Column | oma 

| 2 3 4 5 6 7 s 9 

Row 

| 
1 120 10 30 9 2 so @ 
2 165 a | 1 7 7 22 
| 3 o> we 8 9 3 2 Nn 
| 4 29 5 =. 20 0 5 6 5 2 
| 5 ‘Se eS oe eee ie ee 
| 6 | 5 : w 6 70 14 3 28 
| 7 | 27 > w= » 2 6 32 
| 8 | 68 42 24 2 5 tt 
| 9 | 69 30 5 7 9 6 2 16 
| 10 ou 9 6 20 21 5 0 2 








261 








262 : Miscellanea 


2-40, .. At successively greater depths g, is 5-4-15-8 times its standard error. Evidently the skewness 
is strongly developed, in the sense that there are a few extremely high values and a large number of 
intermediate low ones. (Values for root length per cube less than zero are, of course, impossible.) The 
value of g, also becomes progressively larger down to Depth VII. 

This degree of departure from normality is large enough to render the validity of variance ratio (F) 
tests uncertain for use in testing for lack of homogeneity in sample populations of variances. If such 
tests could, however, be employed, some aspects of spatial distribution might be approached by an 
analysis of variance of root length in terms of the three dimensions of the volume investigated. 

In the present row and column layout the row and column and residual mean squares are estimates of 
the variance at any one depth, if no row and column effects exist, and the question is whether the ratios 
of these estimates follow the #' distribution. The relevant scheme of randomization must allow the eighty 
observations at one depth to be freely distributed among the eighty cells available. Earlier studies have 
considered arrangements with more restricted, or otherwise different, systems of randomization. The 
results from these may not be directly applicable here. 


Table 2. Departure from normality of frequency distribution of root length per cube 












































Depth 
a. * 
I | | 
at See ae ae | | 
Il II IV Vv VI vir | vir 
Rows Rows 
(1-5) (6-10) | 
g, | 0-20 0-94 | 15 1-4 15 2-5 3-6 4-1 1-1 
ee ES 3 
Ie —0-57 0-46 | 32 18 3-0 7-0 15-9 20-9 2+) 
} 








Standard errors: n = 40, oy, = 0-37, oy, = 0-73; n = 80, oy, = 0-27, oy, = 0-53. 


The distribution under randomization for the z transformation of the variance ratio was studied 
empirically by Eden & Yates (1933). Their data consisted of eight sets of thirty-two observations of the 
height of barley. All these sets showed less departure from normality than found here. The data were 
then amalgamated into eight sets of four to simulate a randomized block design, a process which would 
be expected to reduce skewness. They made 1000 random arrangements of the resulting four ‘treat- 
ments’ in eight blocks. They concluded that the empirically determined z distribution did not deviate 
significantly from that expected from a normal population. 

Later, Welch (1937) developed the problem theoretically in terms of a statistic U which is a mono- 
tonically increasing function of z and applied his results to uniformity trial data including both random- 
ized block and Latin-square designs. He concluded (p. 47): 


For Randomized Blocks the cases considered showed close enough agreement between the randomization 
and normal theory variances of U. In each of the three uniformity trials for Latin Square, however, the 
randomization variance of U was considerably smaller than that of the normal theory. 


It should be noticed that the randomized block restricts randomization more than does the present 
layout. Random rearrangement occurs within blocks only, equivalent, for example, in our case to rows, 
the block sum of squares, however, remaining constant. The columns would then correspond to treat- 
ments. The restriction on randomization is greater still in the case of the Latin square. 

Box & Anderson (1955) have discussed the use of permutation tests to assess the effect of departures 
from normality on standard statistical tests, with particular reference to the comparison of means and 
the comparison of independent variances, where the standard of comparison is a value derived from a 
theoretical population. They were able to show the importance of kurtosis as the major disturbing factor 





and de 
of the 

It w 
tion in 
is four 
of non 








263 


and demonstrated the calculation and use of modified degrees of freedom for obtaining the distribution 
of the relevant statistics. 

It would appear that there is evidence to suggest that, on the one hand, since we have free randomiza- 
tion in the n x s cells there may be a smaller tendency for our F distribution to depart from normal than 
is found in Welch’s work (his theory is not directly applicable), yet, on the other hand, the high degree 


Miscellanea 





of non-normality might give rise to considerable deviation in the F distribution.* 


Table 3. Empirical distribution of F under randomization 



































Depth IV 
Frequency distributions for F, Frequency distributions for F, 
Frequency Frequency 
Class Class 
Obs. Exp. Obs. Exp. 
F, 2 2-16 4 5 F,,= 2-04 3 5 
1-82-2-15 5 5 1-74-2-03 7 5 
1-33-1-81 ll 15 1-31-1-73 18 15 
0-92—1-32 27 25 0-94-1-30 28 25 
1/F, 1-09-1-64 31 25 1/F,, 1-07-1-53 25 25 
1-65—2-50 10 15 1-54-2-20 11 15 
2-51-3-29 6 5 2-21-2-78 4 5 
= 3-30 | 6 5 22-79 4 5 
| 











x? = 4:94, P=c. 70%, D.¥F. = 7. 


x? = 4-03, P = 70-80%, p.¥. = 7. 









































Depth VI 
Frequency distributions for F, Frequency distributions for F, 
Frequency Frequency 
Class Class 
Obs. Exp. Obs. Exp. 
F,> 2-16 0 5 F,= 2-04 3 5 
1-82—2-15 1 5 1-74-2-03 3 5 
1-33-1-81 18 15 1-31-1-73 11 15 | 
0-92—1-32 27 25 0-94—1-30 24 25 | 
1/F, 1-09-1-64 33 25 1/F, 1:07-1:53 30 25 | 
1-65—2-50 19 15 1-54-2-20 24 15 
2-51-3-29 2 5 221-278 | 2 5 
2 3-30 0 5 22-79 | 3 5 | 
| | 





x* = 19-39, P=c. 2%, D.F. = 7. 


* An approximate theoretical approach is outlined by N. L. Johnson in the Note on p. 265. 


x? = 11-71, P = 20%, D.¥. = 7. 








264 Miscellanea 


2. FREQUENCY DISTRIBUTION OF VARIANCE RATIO IN 100 RANDOM ARRANGEMENTS 


To obtain empirical information on the distribution of the F statistic under randomization, one hundred 
random selections of the possible permutations of the eighty values for each of the two Depths IV and 
VI, having different degrees of departure from normality, were made by means of random numbers 
(Fisher & Yates, 1949). Each value may, therefore, appear on sampling in any row or column. 

For each of these random selections, calculations were made of: 

(a) F,, = (M.S. rows)/(M.s. residual) for 9 and 63 D.F. 

(b) F, = (m.s. columns)/(M.s. residual) for 7 and 63 D.¥F. 
These are shown in Table 3 for Depths IV and VI. The frequencies in the tail ends of the distribution 
curve are especially important. The class intervals in the tail ends were determined by the restriction 
that no expected frequencies should be fewer than five, as recommended for the y? test, owing to the 
continuity of the distribution of this quantity. The observed frequency of finding a value of F (or of 
1/F for the lower % points) as great or greater than that shown for the 5, 10, 25 and 50 % levels for the 
appropriate degrees of freedom (e.g. as given in the Merrington and Thompson Tables (1943)) is com- 
pared with that expected from the level of probability. 


(a) Depth IV 


Evidently the values of F observed do not differ significantly in frequency from values which would 
be expected from random arrangement of a normal population. In other words, the mean square for 


rows and the mean square for columns are as good estimates of the true variance as one could expect 
from a normal population. 


(b) Depth VI 
The distribution of F for the population found in Depth VI shows important differences from that 
found in Depth IV. 
With 9p.¥. available for the between rows M.s., the observed value of y? has P = 0-20, although there 
is a tendency towards excess of intermediate low values and deficiency in the tails. With 7 D.F. available 
for the column m.s. the frequencies of F are significantly different from those expected from a normal 


population. There is a deficiency of large and small values of F, and also an excess of intermediate low 
values as appeared in the case of rows. 


3. CONCLUSIONS 


These results will apply only to the present randomization system, which is characterized by free 
distribution among the cells of the two-way table, with the degrees of freedom available. In order to 
generalize further it should be borne in mind that we are assuming that not only the random number 
tables used here, but also that other such tables will always sample in the same manner (Kendall, 1941). 


Table 4. Empirical % points and standard deviation of F 
































| F, Pe 
q 
| | 
Lower | Upper | Lower Upper 
| | | | 
: 6 | | % 
5% | 10% | 10% | 5% 5% | 10% | 10% | 5% | 
| | | 
| | | 
Depth IV 0-27 0-36 1-78 2-12 0-54 0-36 0-52 | 1-78 2-03 | 0-50 
| | | 
| | | 
| Depth VI 0-44 | 0-49 1-48 1-56 0-37 0-45 | 0-51 | 1-52 1-96 | 0-43 

















Some theoretical values of these quantities are given by N. L. Johnson in a table on p. 266. 





With 
for tests 

In suc 
the 5 ar 
freedom 
significe 
cases W] 
with va 


I sho 
empiric 


Box, G 
EDEN, 

FISHER 
FISHER 


WELCE 
YULE, 


The e1 
distril 
izatio: 


and 


nd 


e 
il 





Miscellanea 265 


With non-normality of the order found in Depth IV it seems that standard tables for F may be used 
for tests of significance in the analysis of variance. 

In such an extreme case as Depth VI, the standard deviation of F is much reduced (see Table 4, where 
the 5 and 10 % points are also given) especially in the case of F’, which has slightly fewer degrees of 
freedom than F’,. The standard tables would underestimate the frequency of occurrence of cases showing 
significance at the 5 % level. A complete mathematical analysis would evidently be desirable to cover 
cases where the non-normality is as great or greater than has been observed here over a range of designs 
with varying degrees of restriction on randomization. 


I should like to thank Prof. E. 8. Pearson for his stimulating suggestions on the implications of the 
empirical randomization. 


REFERENCES 


Box, G. E. P. & AnpsErson, S. L. (1955). J.R. Statist. Soc. Series B, 17, 1. 

EpEN, T. & Yates, F. (1933). J. Agric. Sci. 23, 6. 

FisHER, R. A. (1947). The Design of Experiments, 4th ed., p. 43. Edinburgh: Oliver and Boyd. 

FisHer, R. A. (1950). Statistical Methods for Research Workers, 11th ed. pp. 52, 75. Edinburgh: 
Oliver and Boyd. 

FisHER, R. A. & Yates, F. (1949). Statistical Tables for Agricultural, Biological and Medical Research, 
3rd ed. Edinburgh: Oliver and Boyd. 

KENDALL, M. G. (1951). Biometrika, 32, 1. 

LEONARD, E. R. (1952). Rep. 13th Int. Hort. Congr. 2, 885. 

MerRRINGTON, M. & THompson, C. M. (1943). Biometrika, 33, 73. 

We cu, B. L. (1937). Biometrika, 29, 21. 

Yuus, G. U. & Kenpatt, M. G. (1947). Theory of Statistics, p. 437. London: Griffin and Co. 


Theoretical considerations regarding H.R.B. Hack’s system of randomization 
for cross-classifications 


Note by N. L. JOHNSON 
University College London 


The empirical distribution of the F’-ratio investigated by H. R. B. Hack is similar to the randomization 
distribution studied by Welch (Biometrika, 29, 21-52 (1937)). In the latter case the system of random- 
ization was restricted by the condition that observed values should remain in their original rows: 
Hack, on the other hand, allows all rearrangements of the original data among the cells of the n xs 
table to be possible. 

Welch gives formulae for the randomization mean and variance of 


between column sum of squares 





between column sum of squares + residual sum of squares 


These are &(U) = 1/n; var(U) = 7), 
where A= (> us? (SD vis)? 
a 2 


and u,; = (original observation in ith row and jth column) — (mean of all observations in ith row). 


The expected value of U in Hack’s case is also 1/n; in the formula for var (U), A needs to be replaced 
by A, the average valueof A over all possible assignments of observed values to rows. The exact calcula- 
tion of A appears to be difficult, but if we neglect the row means (i.e. assume that their deviations from 
the grand mean are always zero) we find that 

4 

qe PP a s(n—1)_ 2-1 

(Yui)? ns—1 ns 
ij 


4 
~ n%(s—1) ns—1 (Suis)? 
tj 











266 Miscelianea 


The randomization distribution of U might be approximated by a type I distribution having correct 
terminals and first two moments. If this is so, the corresponding distribution of the ratio 


(between columns mean square)/(residual mean square) 

would be an F distribution with degrees of freedom 
Vp =(n—1)y; vy = (8—1)(ns—1)/(ns—g) —2/n, 
x x uss 
where nsg = Oya) 
tj 
Similar expressions would apply to the approximate distribution of the ratio 
(between rows mean square) /(residual mean square). 


Using the data provided by Hack we obtain the ‘equivalent’ degrees of freedom shown below 























Columns (for F,) | Rows (for F,) 
| 
Vy Vo | Vy Ve 
| | 
Normal theory 7 63 9 63 
Depth IV | 73 65-4 | 9-4 65-5 
Depth VI | 8-8 79-6 | 11-4 79-7 
| 





Using these degrees of freedom to obtain percentage points for F’, we have (after interpolation) 


Rows (for F,) ag 





Columns (for Fy) 








Lower 








PRES | | 


Normal theory 0-30 0-40 

Depth IV 31 41 | 

| Depth VI 36 45 | 1-68 
| 





1-82 | 2:16 | 0-59 | 0-36 0-45 1-74 2-04 0-53 
2°15 | -58 “37 -46 1-73 2-02 +52 
201 | -52 | -4i -50 | 1-68 1-91 “47 


























The values of the standard deviation of F were obtained from the expression 





2 —2))3 
oa Vg pra 2) 


Ve—2\ v,(v2—4) 


These results suggest that the Depth IV distribution should be only slightly affected by the non- 
normality, while at Depth VI considerably less variability than that predicted on normal theory would 
be expected. Phenomena of this nature are in fact exhibited by the observed values in Hack’s Tables III 
and IV, though the actual results for columns at Depth VI are considerably less variable than those 
predicted by the approximate theory outlined above. 





rrect 


ion- 
yuld 
I 
Lose 





Miscellanea 267 


On the equivalence of two tests of equality of rate of occurrence in two series of 
events occurring randomly in time 


By D. E. BARTON 
University College London 


We suppose two time periods of length ¢, and f, in which n,, n, events are observed. On the assumption 
that each is the realization of a Poisson process, i.e. that, for some constants A,, A, 


A,t,)" ° 
p(n) = as i = 1,2), a) 
it is desired to test the hypothesis: A, = A,. 

Przyborowski & Wilenski (1939) noted that, conditionally on n, +n, = n, n, is binomially distributed 
with parameters n and p = t,/(¢,+,). Considering the symmetrical case ¢, = t, they proposed the rejec- 
tion of the null hypothesis, with a first kind of error a, if 


Y 


(a) ny<k,(n,a) or (6) ne<k,(n, a), 
where P{n,<k,(n, a) |n, Ay=A} < 4a (2) 
and where, in this case, k,(n,a«) = k,(n,a«). Their method extends directly to the case ¢, +¢,, the only 
difference being that p + 4, in general, so that the rejection values k, and k, are different from one another. 


Cox (1953) proposed what was apparently a very different test. It is the purpose of this note to show 
that the two tests are substantially equivalent. 


Table 1. Table of values taken by u, v, w for different values of n,, when n= 10, 20 and p=} 


| | | | 
| yor) ¢ a he srs | el es | es 
| | | wae eee hie 
| | | | | | 
0-033 | 0-113 | 0-274 | 0-500 | 0-726 | 0-887 | 0-967 | 0-994 | 1-000 
055 | -172| -377 -623| -828 -945/| -989| -999 | 1-000 








| | 

Rs. s<< | 0 | 1 
| 
} 


[ 
n=10 | u | 0-000 | 0-006 


| v 001 | -O1l 











wi} -000 | -004 | 026 | +102 -265 | -500 | -735 | 897 | -974 | -996 | 1-000 | 

| | | 

| | | } 

n=20 | u | 0-000 | 0-000 | 0-000 | 0-001 0-004 | 0-013 0-039 0-095 0-192 | 0-332 | 0-500 | 

| |v | -000 | -000| -000) -001  -006| -021  -058 | -132 +252) -412| -588 | 

| | w 
| 


‘000 | -000 | -000, -001 003 | -012 ‘036 “089 -186 328 -500 | 





Let us consider the upper tail for simplicity, the argument for the lower tail being the same. Then, 
using the incomplete /-function representation of a sum of binomial probabilities, the rule (6) may 


be 
expressed, I, (ny, 22+ 1) < ho vd 


which amounts to transforming n, by the discrete analogue of the probability integral transformation 
using the v-form of David & Johnson (1950), i.e. where the transform v = v(n,) is the cumulative pro- 
bability distributior: function of n,. Let us modify this slightly to 


T,( + 4,2 +4) <3, (4) 


so that the first kind of error is not made so strongly less than & by the discontinuity of the distribution. 
The form of ‘continuity correction’ is in practice hardly to be distinguished from David & Johnson’s 
u-form of discrete probability integral transformation, where u = u(n,) = (v(m) +(m,—1)). Thus, 
denoting the left-hand side of (4) by 1—w, and taking for example the values (20, 4), (10, 4) for (n, p), 
uand w are seen in Table 1 to run through a very nearly equal set of values as n, goes from 0 to n. Since 
the departure from rectangularity manifests itself in a decreased variance it is of interest to note that 
var u = 0-0805, var w = 0-0849 for n = 10 compared with the rectangular value of 1/12. The inequality 
(4) may be written in terms of the probability distribution function P,, ,(F) of the F distribution of 


(Y;,¥_) degrees of freedom, p(2ng +1) 
Pon, +1, 2n,t+11/)__ 7) 


< tea. 5 
pert oes aia ° 








268 . Miscellanea 


Now if F_(€; v4, v2) is the lower 100¢ % point of an F having 1, v, degrees of freedom, we may write (5) as 
p 2m,+1 
1—p2n,+1 





<F_(4a; 2n,+1, 2n,.+1) (6) 


which is the lower tail rejection rule of Cox. 
It will be noted that without the continuity correction (4) Przyborowski & Wilenski’s test consists 
of using the rejection regions described in Cox’s language by 


t, 2 2 
5 Mat" > F (40 2n,, 2n,+ 2), 
2 omy 





(a) 


t, 2ng+2 


(6) << F_(4a; 2n,, 2n, +2). 
t, 2n, 





This test has an error of the first kind certainly less than a, but generally more so than is desirable whereas 
Cox’s modification has an error which will be greater or less than «, randomly, according to n (and also 
varying with p). This error will be approximately « when averaged over n (as it must be in the experi- 
mental circumstances envisaged where n will have a truncated Poisson distribution of some sort). This 
follows from the relation of (2) to David & Johnson’s u-form of discrete probability integral trans- 
formation. The numerical results given by Cox in his Table 1 serve to verify David & Johnson’s general 
conclusions in yet another instance, but it will be interesting to see the complete results of the systematic 
investigation of the problem on the lines they lay down, now being carried out by D. H. Young who 
has kindly provided the figures of Table 1. 


REFERENCES 


Cox, D. R. (1953). Biometrika, 40, 354-60. 
Davin, F. N. & Jounson, N. L. (1950). Biometrika, 37, 42-9. 
PRZYBOROWSKI, J. & WILENSKI, H. (1939). Biometrika, 31, 313-23. 


The mathematical relation between Greenberg’s index of linguistic 
diversity and Yule’s characteristic 


By G. HERDAN 
University of Bristol 


At the first glance, the two measures mentioned in the title are different both in form and meaning: 
Greenberg’s constant is meant to measure the diversity of language in a speech community, and is 
mathematically the complement to unity of the sum of the squared probabilities of these languages. 
Yule’s constant characterizes the vocabulary-occurrence (type-token) relation in a given text, and is 
mathematically the ratio of the second to the squared first moment of the word count distribution. 


GREENBERG’S CONSTANT 


The examination of any map of linguistic areas will show regions of greater diversity and others of 
relative uniformity, while still others may seem intermediate between these extremes. Greenberg’s 
aim is to have quantitative objective measures of diversity by which to replace such subjective impres- 
sions. The simplest model, called the monolingual non-weighted method A, may be described as follows 
(Greenberg, 1956). 

If from a given area we choose two members of the population at random, the probability that these 
two individuals speak the same language can be considered a measure of its linguistic uniformity. If 
everyone speaks the same language, the probability that two such individuals speak the same language 
is obviously 1, or certainty. If each individual speaks a different language, the probability is zero. 
Since we are measuring diversity rather than uniformity, this measure must be subtracted from 1, so 
that our index will vary from 0, indicating the least diversity, to 1, indicating the greatest. 

If the area comprises speakers of r languages and the proportion of speakers of language i is p; 
(¢ = 1,2,...,7), then the total probability of choosing two speakers of the same language is the sum of 





the pr¢ 


as sugt 


This 
the pre 


On the 


where 
the di 
where 
which 


which 
vocab 
regarc 
lary it 
and d: 


Yule’ 
of the 
of lin 


If th 
of w 


)) as 


(6) 


ists 





Miscellanea 269 


the probabilities of such an event for each individual language; i.e. 5) p?. Subtracting this sum from 1 


u 
as suggested above, the formula for the coefficient of linguistic diversity becomes 


v3 
A=1-% pi. (1) 
i=1 
This may be illustrated by a hypothetical example. If in a population three languages are used and 
the proportion of speakers is as 4: 3:4, then 


A = 1-{(3)?+ (8)? +(4)3 = 1-32 = 38, or 0-594. 


YULE’S CHARACTERISTIC (Yule, 1944) 
On the other hand, Yule’s Characteristic K is 
K = 10“S,/S{—1/S)), (2) 


where S, and S, are the first and second moments, respectively, of the word count distribution, that is 
the distribution of vocabulary items according to frequency of occurrence: S, = Xf,X; S, = Uf, X?, 
where f, is the number of words occurring X times. For large samples, and neglecting the factor 104 
which Yule used only so as to avoid very small values of K, the Characteristic becomes 


K* = S,/Si, (3) 
which is a characteristic constant of the word count distribution epitomizing the relation between 
vocabulary and word occurrence, that is frequency of use, in a given literary text (or author). It may be 
regarded as measuring the extent to which word occurrences are concentrated upon particular vocabu- 


lary items, and it shows that a particular style is characterized by a constant relation between uniformity 
and diversity in the number of repetitions of the items of vocabulary (Herdan, 1956). 


THE REPEAT RATE 


Yule’s Characteristic is thus a measure of stylistic diversity, and is mathematically expressed as the ratio 
of the second to the squared first moment of the word count, whereas Greenberg’s constant is a measure 
of linguistic diversity among the members of a speech community and mathematically the sum of the 
squared probabilities of the various languages, which makes them look rather different from one another. 

However, the close relation between the two becomes apparent if we use the interpretation given in 
a paper by Good (1953) of the Characteristic K as the repeat rate of words, i.e. the probability that two 
words chosen at random from a text will be the same dictionary word. That paper suggests methods of 
estimating, among other things, various general population parameters measuring heterogeneity. One 
such parameter for which Good uses the symbol ¢,,, is an estimate of the probability that two words 
selected at random from the text under consideration will turn out to be the same word of the language. 
Hence, it tends to be larger the more repetitive is the author’s vocabulary, or more roughly, the smaller 
is the author’s vocabulary. 

If the population probabilities of the distinct words are p,, po, ... then, as Good has shown, 


A S,—S, ‘ 

ae Es 

#0 = SSi—V) ” 
represents an unbiased estimate of pj+p3;+... as S,->0o. Since Yule’s Characteristic K is equal to 
10%» 4(1—1/S,): 











1 S.-f, f+} 
K= 16%, 41-—) 106. 
20 =) S,(S,;-1) 8S, 
S, 1 
= 164 ~—}, 5 
a a - 


it follows that it tends to 104(p?+ 3+...) as S, > 00. 
Thus, Greenberg’s measure of linguistic diversity A and Yule’s Characteristic K are both of the 
nature of a repeat rate, and therefore of essentially the same mathematical structure. 


ILLUSTRATION 


If the summation in the numerator and the denominator of formula (3) refers not to f,, i.e. groups 
of words of equal occurrence frequency, but to the individual words, the formula becomes 


K* = UX{/[ZX,}, 








270 | Miscellanea 


where X is the frequency of occurrence of a word, and different values of X imply different probabilities 
of occurrence. In Greenberg’s formula for A, the corresponding probabilities of occurrence are the values 
of p, for the different languages. Using the values of p; from our hypothetical example and assuming 
a sample of the population of, say, 1000 inhabitants, the values of X result as 


4x 1000 = 125 
8x 1000 = 375 
4x 1000 = 500 


1000 
and the characteristic is calculated as 


K = (125? + 375? + 400?)/1000? = 406-255/1,000,000 = 0-406, 
and A=1-K = 0-594, 


which is the value of A calculated by formula (1), in accordance with the conclusion, reached on 
theoretical grounds, of the essential similarity in mathematical structure of Greenberg’s coefficient 
of linguistic diversity A and Yule’s stylo-statistical Characteristic K. 


REFERENCES 


Goop, I. J. (1953). The population frequencies of species and the estimation of population parameters, 
Biometrika, 40, 237-64. 

GREENBERG, J. H. (1956). The measurement of linguistic diversity. Language, 32, 109-15. 

Herpan, G. (1956). Language as Choice and Chance. Groningen. 

Yutez, G. Upny (1944). The Statistical Study of Literary Vocabulary. Cambridge University Press. 


Note on a discontinuous probability density 


By J. E. KERRICH 
University of the Witwatersrand, Johannesburg 


Consider the probability density p(x) defined as follows: 


p(x) = axpa-2-1 (0<a,<a<£; a>0) 
= bagg’-2z7-°-1  (E<xx<o; b>0), (1) 


and is zero for all other values of x. 
At x = &, p(x) has a saltus of amount 


(b—a) xgg-*. ” 
Then, if P(x) = i) p(x) dx, 
z 
P(x) = 2§x-4 (%)<" <6), (3) 
= agfo-az->  (E<a<oo), 


and is a continuous function in the range x»<a2 <0. 
Writing z = log,)x, € = log, and z, = log,)2,) we have 


logig P(x) = a(Z% — 2) eal (4) 


= a(zy—C)+b(E—z) (E<ax<oo). 


The above distribution function was suggested by the data given in Table 1 and Fig. 1. Attempts to 
‘fit’ this function by maximum likelihood methods were unsuccessful, and a modification of a method 
used in an earlier paper (Kerrich, 1949) was used instead. The technique described is essentially a large- 
sample technique. 

The range x) <a <0 is divided into k+1 intervals x;<2<2;,, (i = 0 to k), and out of n observations 
of x, f, fall within the interval x, <x <.2;,,, 244, = ©. It is assumed that the observations form a random 
sample in the sense that for given n the f; have a multinomial distribution. (5) 





Th 


and 
as in 
prop¢ 


wher 


and 


in w 


and 


whi 





lities 
ulues 
ning 


d on 
sient 


ters, 


(3) 


(4) 


ts to 
thod 
arge- 


tions 
\dom 


(5) 





Miscellanea 271 


Then, ify; = > f;/n (i = 1,...,%), it follows that for large n the y; are approximately normally corre- 
j=1 
lated, with mean values and covariances 
E(y;) = P(x) =P, say, a 
cov (y:y;) = P(1—P,)/n (Pj>Pi), J 


Next, if w; = log,)(y;) it follows that for large n the w, are approximately normally correlated with 


E(w;) = logy P(x), 
which has the values given in (4) and 
cov (w,w;) = w7>[1 — P;]/nP;, (7) 
where f= 0-4343... and P,;>P,. 
By graphical methods initial estimates are obtained for a, b and ¢ (see Fig. 1). Call these estimates 


dy, by and , respectively, and let Aa = a—ay, Ab = b—b, and A = (—G. 
Make the minor transformation 


Vi = Wi — Ag (Zp —2;) when 2,<¢€ and 1=1,2,...,.m 


= W; — Ag(Zy— Cy) —Do(Co—2;) When z,>€ and i=m+l],...,k. 


Then assuming that terms of higher order are negligible, 


B, = E(v,) = Aa(z)—z;) when 2,<¢) . 
= Aalzy—f) + Ab(fo—2) + A€(by—a) when 2,>¢J ” 
and cov (v;v;) = Cov (Ww; W;) (10) 


as in (7). Thus the v; are approximately multinormally distributed with joint probability density 
proportional to exp[—4(V —BYO(V —B)], (11) 
where V is the column vector {v,}, 


B is the column ve>tor {/;} 


and C is the inverse of the coefficient matrix 
(°[1 — Py]/nP 3}. (12) 
The elements of C are C;; = nu-*P3[(P,_. — P+ (By — Bas), 
Cys = — OP, Pi ( Bj — Pyaa)* = Char,s | ses 
and the remaining C,; are zero. In matrix notation, (9) becomes 
B=KT, say, (14) 


where I’ is the column vector of y, = Aa, y, = Ab and y; = A¢, which are the three quantities to be 
estimated, and K is the k x 3 matrix of the coefficients appearing on the right-hand side of equations (9). 
(In numerical applications m is chosen so that z,,< {> and 2,4; > €-) 

Using (14), (11) becomes 


exp[—}(V—KT)’C(V-—KT)], (15) 
and if J; is the least-square estimate of y, (i = 1 to 3), then they are the solutions of the normal equations 
K’CV = K’CKT (16) 
in which t = 3 is the column vector of the 7;. Next, (V —KT)’C(V — KT) is to be split up into 
(V—KTY O(V—KT)+(P-TyK’CK(P-1), (17) 
and it can then be shown that the Vi are normally correlated with means y; and covariance matrix 
(K’CK)-—, j (18) 


while (V —KTY'O(V—KT) J x25 (19) 
‘ 








272 | Miscellanea 


has a y?-distribution with k—3 p.¥. In practical applications C is unknown and has to be estimated by 
replacing the P; by the approximation obtained when ap, by and €, are used. 

From (18), confidence intervals are obtained for the values of a, 6 and ¢, and if required, confidence 
belts for the position of the straight lines in (4). 

(19) provides a goodness-of-fit test and checks whether the model laid down in (1) is a plausible one 
for the data in hand. 

In the practical application considered here the data in Table 1 refer to the particle size distribution 
of airborne dust in a gold mine. 


























Table 1 
Particle size Percentage of 
in arbitrary No. of particles of 
units particles size>2x 
x P i 100y 
12-5 1 0-09 
10 2 0-36 
8 5 1-00 
6 14 2-72 
4 50 8-51 
2 130 24-80 
1 160 51-04 : 
0:5 381 100-00 
0 = 
—1 = 
w 
—2 s 
-3 | l 








i 
0 Ce 10> 
Fig. 1. The data are used with permission of the Transvaal and O.F.S. Chamber of Mines. 


Dust is precipitated on to a glass slide. A small area on the slide is examined under a microscope. The 
field of view contains a set of small circles of known diameter. The observer matches each particle ob- 
served with one of these circles. Thus, of 381 particles stated to be of size 0-5 it is assumed that half are 
greater and half are less than that size, and so on for the other sizes given. With the microscope used, 
there is a lower limit x, (corresponding to 0-5 arbitrary units) below which practically nothing is known 
about the particle size distribution. No attempt is made here to make any judgement about what 
happens in that region. 

In Fig. 1, w = log, y is plotted against z = log,)2. The fact that the points lie close to two straight 
lines suggested the mode! discussed in this note. 





The 


>d by 
lence 
e one 


ition 





Miscellanea 273 


The computations yield the following results: 





Parameter | 95 % confidence limits 





a 0-994 + 0-099 
b 3-16 +0-93 
t 0-525 + 0-054 











(V—KT)/C(V —KT) = 2-860 = yj. 
Since the corresponding P, for = D.¥F. is approximately 0-60, the ‘fit’ is satisfactory. Even if a physicist 
might not care to accept the model discussed here as the ‘true’ law of distribution, it appears to be a 
reasonable approximation in this particular case and in several other similar cases. The emphasis in 


this note is on methods and not on applications. 


REFERENCE 
Keraricu, J. E. (1949). Normalization of frequency functions. Nature, Lond., 164, 1089. 


A remark on Spearman’s rank correlation coefficient 


By HARALD BERGSTROM 
Chalmers University of Technology, Géteborg 


The well-known Spearman’s correlation coefficient p(P) for a permutation 


i ar 
— ahaa 


6d(P 
is given by p(P) = 1-0), 
l n 
where dP) =— > (i-k,)*. 
Ni=1 


The distribution function of p(P) has been studied in the case when all permutations are possible and 
have the same probability. When the possible permutations belong to a subgroup (in the algebraic 
sense) g of the symmetric group y, of all permutations I have found the following interesting fact. 

If g is a transitive subgroup of Yn, 1.e. a subgroup such that the figure 1 can be transformed into the figures 
1,2,...,% by the permutations of g then the mean 


1 
E,{d(P)] = ord (g) 20 


(ord (g) denoting the order of g) is the same for all such subgroups, 
E,[d(P)] = E,,{4(P)}.- (1) 


If furthermore g is double transitive, i.e. if (1,2) can be transformed into every system (v,A), v = 1, 2,...,n, 
A=1,2,...,n, vA, by the permutations of g then d(P) has also the same variance for all such g. 


In fact, I show that E,[d(P)] = E,,[a4P)] (2) 
for all g¢ y,, which are double transitive. Then, of course, we also have 
var, [d(P)] = H,{d*(P)]— Hj{d(P)] = var,, [d(P)). (3) 


As a consequence of (1) and (3) we get 
E,{p(P)] = 0, (4) 


18 Biom. 45 








274 Miscellanea 


if g is a transitive subgroup of y, and 
var, [p(P)] = var,,,[p(P)], (5) 


if g is a subgroup of y, which is double transitive. 
I now prove these statements. Consider the polynomial 


l n 
d(P,x) = = > (a;—2f)?, 
{= 


where 2,2, ...,2%, are variables and af denotes the map of x; by the permutation P. I assume that g 
is a transitive subgroup of y, and prove that then 





E,{d(P,x)] = E,,{d(P,x)). (6) 
In fact we have E,{d(P, x)] = =; Oz z d(P. = a ~ Dy (a,—aP)? 
€gi= 
_ 1 
smh). 7 
~ nord (9) p» 2 adi (7) 


Let g® be that subgroup of all permutations in g which leave the figure 7 unchanged. Then g has the 
order ord (g)/n and we easily get 





12 
= ox z (x;—- = oe (x; —2;,)*. 
l n n 
Hence we obtain from (7) E,{d(P,x)] = ~ + Dd (a%;-2;)?. 
i=1j=1 


The right side is independent of g. 
If g is double transitive we have 


1 
E,{d*(P,x)] = ord(g) By”) 


1 1 nn Pye Pye 
= Ord (gy n® ey oh pe ee 


1 1” n 
~ ord (g) n# Pe) Py R, (aa— at) (ay— 275)? (8) 





First we consider 1+j. Let g be the subgroup of all permutations in g, which leave the pair (i,j) 
unchanged. Then ¢» has the order ord (¢)/n(n — 1) and we get 


1 
ord (g) Pile (x,;—af)? = n(n—1) : 7, (aj — x)? (a — 2). 
V,AM=1,4,...,0 


For ¢ = j we obtain in the same way as in the proof of (6) 


1%, 
=a 2 z (z,—af)* = nye (;—2,)*. 
Hence we get by (8) 
1 
E,{d(P,x)) = — 2 (x; —2,)*(a,—m)*+— 3 z 2 (x,;—2,)*. 


n*(n — 1) See 
i, ov, =], . Sis an 


Here again the right side is independent of g. The theorem follows. 





It is v 
exist { 
of val 
sort 0 

Che 
the te 
these 





(5) 


+d) 





Miscellanea 275 


Interval estimation for the parameter of a binomial distribution 


By C. W. CLUNIES-ROSSt{ 
Virginia Agricultural Experiment Service of the Virginia Polytechnic Institute 


1. INTRODUCTION 


It is well known that sets of confidence intervals for several confidence coefficients do not, in general, 
exist for a parameter of any finite discrete distribution when the parameter space contains an infinity 
of values. In particular this is true for the binomial distribution. However, it frequently occurs that some 
sort of interval estimate for the parameter of a binomial distribution is desired. 

Charts are available for determining estimation intervals, e.g. Pearson & Hartley (1954), based on 
the technique suggested by Clopper & Pearson (1934). This paper will establish certain properties of 
these intervals and then suggest an alternative technique. 


2. TERMINOLOGY 


This paper will consider estimation sets, or intervals, for the true parameter 0* of a binomial distribu- 
tion. The usual conditions are assumed to hold; these may be stated formally as follows: 
(i) Admissible (or possible) values of 6* are the numbers between and including zero, unity; i.e. 
6*c(0, 1) where ‘c’ means ‘contained in’; ‘¢’ will be used for ‘is not contained in’. 
(ii) The sample, on which the estimation set is based, is of known, predetermined size equalling n. 
Thus, n is a fixed number and not a random variable. i 
(iii) The probability of r ‘successes’ in the sample is defined by 


% *r, — A*\n-r ; 7 
aiteade (1-—0*)"—, ‘ieee (21) 


0, otherwise. 


Thus, r is a random variable which is observed for the particular sample. 

An estimation set for 0* may, technically, be any collection of admissible values. But any collection 
other than ‘all those values contained in some interval’ is liable to be confusing in practical work. This 
difficulty does not arise in the technique suggested by Clopper & Pearson, but may arise with the 
alternative technique discussed in § 4. 

The estimation set is to be a function of r and is to be well defined when r is any integer in the range 
(0,n). This precludes from the paper any consideration of posterior randomization procedures. 

The following notation and terminology is used: 


R,S,S’: systems of estimation sets or estimation procedures; (2-2) 
O(r,R): the estimation set defined for r by R; (2-3) 
R(O@): the aggregate of r such that 0c O(r, R). (2-4) 


R(@) will be called the coverage set for 0 on R, the estimation procedure used, since it is composed of 
just those values of r whose estimation sets include 0. Either (7, R) or R(@) may be used as a specification 
of R since either may be derived from the other because 


re R(6) implies that 0c (7, R), 


and 6cO(r,R) implies that re R(A). 
Clopper & Pearson’s approach is to set up @(7, R) whereas my alternative technique is to set up R(A). 
PO|R)= DY p,(9). (2-5) 
re RO) 


P(@| R) will be called the probability of coverage of 0 on R since, if 6 is the true value of the para- 
meter, it is the probability that an estimation set, to be based on a random sample, will contain the 
true value of 6. 


Note that P(@| R) is a function of both @ and R. Further that if R represented an exact system of 
confidence intervals then P(@| R) would be independent of 0. 


a(R) = g.l.b. {P(O| R)}. (2-6) 
0<@<1 


+ Research sponsored by the National Science Foundation. 








276 | Miscellanea 


a(R) will be called the confidence coefficient for R, since it is the minimum probability that the true 
value of the parameter will be contained in an estimation set which is to be based on a random sample. 

This definition (2-6) for confidence coefficient was given by Tweedie (1955). Other workers in this 
field have used the term confidence coefficient similarly, but in terms of the algorithm for calculating 
the estimation set; in these cases their confidence coefficient, as defined above, is greater than a(R). An 
analogous situation occurs with the significance level for significance tests for discrete distributions 
e.g. Barnard (1947) in his treatment of the 2 x 2 contingency table. Following Tweedie I shall call this, 
other confidence coefficient the ‘desired confidence coefficient’. 


6(0,R): the collection of admissible values of 6* with the same coverage set as 0 (on R). (2-7) 
Thus if 6,c0(0,, R), 
then (i) R(O,) = R(A,), 


(ii) 0(0,, R) = A(4,, k), 
(iti) 0,¢0(0,,R), 
but (iv) P(0,| R) is not in general equal to P(A, | R). 
These 6(0, R) divide the admissible parameter space into various regions. The estimation procedure 
R will not make any distinctions between different possible parameter values in the same region, since 


either all or none of the values in any region will be contained in any estimation set. 
Other notations that will be used are 


r<R(@): rless than every member of R(), (2-8a) 
r>R(0): r greater than every member of R(@). (2-8) 


3. PROPERTIES OF THE METHOD OF CLOPPER & PEARSON (1934) 


This method, as generalized slightly by Stevens (1950), will be referred to throughout as R. Strictly, 
R is a class of methods; an individual method being specified by Py, P,, as below, where Py, P,>0 and 
0<P,)+P,<1. However, as Py, P, will be considered as general but fixed they will be exhibited explicitly 
in R. 

R, then, is the procedure leading to the estimation sets which are closed intervals 


O(r, R) = [A,(7, R), (7, R)] 


defined by Spor, R)j=P, (r+#0), (3-1) 
; 6,(0, R) = 0, (3-2) 

SpildhirR)) =P, (ren), (3:3 

0,(n, R) = 1. (3:4) 


The desired confidence coefficient = 1— P,—P,. 
R’ is the same as R except that the estimation intervals are open. Stevens has demonstrated 


Prob [0,(7, R) < 0* <0,(r, R)|>1—P,—P,, (3-5a) 
Prob [0,(r, R) < 0* <0,(r, R))>1—P)—P). (3-5) 
His proof is vague enough to be valid whichever of 9* or (9,, 6) is considered to be random. He follows 


this with a probability statement concerning (3-56) which is a function of 6*. 
However, by a more roundabout method stronger results than (3-5 a, 6) can be demonstrated 


if 0<0,(0,R), then: P(0|R)>1—P,, (3-6a) 
if 0>0,(r,R), then: P(@|R)>1—P,, (3°65) 
for all@ P(0|R)>1—P,—P,. (3-7) 


It will be noted that I am considering the case when the intervals are closed which correspond to (3-54); 
the difference between closed and open intervals is negligible when the confidence coefficient is defined 
by (2-6) provided that the open intervals are closed at 0 and 1. 





Fir 


If ( 


lure 
ince 


‘8a) 
*8b) 


tly, 
and 
‘itly 


3+1) 
3-2) 
3-3) 
3-4) 


5a) 
5b) 


OWS 


6a) 
6b) 
3°7) 


a); 
ned 





Miscellanea 277 


First the following lemmata will be established: 


> p,(9) is monotonically decreasing in 6 for all k, (3-8a) 
n 
~ p:(9) is monotonically increasing in 6 for all k, (3-8 b) 
R(@) is a consecutive set of integers for all 0, (3-9) 
ifr<s, then 46,(r,R)<46,(s, R), (3-10a) 
ifr<s, then 06,(7,R)<0O,(s, R), (3-106) 
6(0, R) is a single interval for all 6, (3°11) 
(r+ 1, R) <0,(r, R), (3-12) 
the aggregate set of the intervals 6(0, R) covers the interval (0, 1). (3-13) 
Proof of lemmata. 
Put f(0,k) = SpA) " = (") (1 —0)"-i, 


s n)\ . i— n—i n if, n—i-— 
m= >| (') 11-8) -(j)om-na-9 ‘| 


n—1 
_ k, n—k—-1 
= n( )o (1 0) : 


d 
o<0 for allk,n,@ if @+0,1, 
d, 
T= 0 forallk,n if @=0,1. 


k 
Hence >) p,(9) is strictly monotonically decreasing in 9; (3-86) may be proved similarly. 
0 


Note that (3-8a), (3-86) show that 4,(7, R), 0,(r, R) are well defined. 
(3-14a, b) follow immediately. 


k k 

If 0,<0,, then Yp(6,)> Dp.) (for all k). (3-14a) 
0 0 
n n 

If0,<6,, then )p,(6,)<> p92) (for all k). (3-146) 
k k 


(3-9) follows from the definition of R and (3-144, b). 

(3-10 a, b) follows from the definition of R, (4:13 a,b) and the fact that p,(@)> 0. 

(3-11) is proved by reductio ad absurdum on (3-104, b). 

(3°12) follows from the definition and the restriction 1— P,—P,>0. From (3-12) it follows that every 
admissible value of 0 lies within some estimation set; hence that R(@) is never null; hence (3-13). 


Proof of (3-6, b). 


Let s be the maximum integer such that 0,(s, R)<90,(0, R). Then by (3-12) s>0. Let 0’ <6,(s, R). 
Then R(0’) = (0,1,...,%), where k depends on 6’. Further 6(6’, R) is bounded above by some 0,(k + 1, R) 
and 0,(k+1,R)¢0(0’,R) from (3-106) and closed interval. Hence for 0c0(6’,R), by (38a), (3-9) 
and (3-1), we have 


k n 
g.l.b. P(O| R) = DP Ook + 1,R)) = 1— 2 PelOolk + 1,R)) = 1—P,. 
If 0,(8, R) <0 <6,(0, R) (s as above), then P(?| R)>1—P,. Then R(A) = (0,1, ...,8) by (3-9) and 
8 8 
g.l.b. P(O| R) = Xd (0, R)) > Lipa(Ools + 1,R))=1—P, 


with equality only if (s+ 1,R) = 9,(0, R). 








278 Miscellanea 


Proof of (3-7). 


If 0,(0, R) > 0,(n, R) then (3-7) follows immediately from (3-6 a, b). 
If otherwise, let 0,(0,R)<0<6,(n,R), then R(A) = (k,,k,+1,...,k,), where, k,>0, ke<n and 
6,(k, —1, R)<0<6,(k,+1,R). Hence 


ke ki-1 n 
P(O| R) = Yp{O) =1-— Y p(O)— ¥ p{0)>1-Py-P, 
ky 0 Kke+1 


by (3-84, 6). This result together with (3-6 a, 6) is sufficient. 
Note further that a(R)>1—P,—P,, 


a(R’)>1—P,—P,. 


Equality can occur only if the sequences 6,(0, R), (1, R), ..., O9(n,R) and 0,(0, R), 6,(1, R), ..., 
@,(n, R) have a member in common; this does not occur for general Py, P,, n. 


4. ALTERNATIVE TECHNIQUE 


I have used a less direct approach than Stevens in order to develop the notation of coverage sets in 
terms of which I shall develop an alternative technique. The results (3-6 a,b) could have been proved 
in the same manner as Stevens proved (3-5a, 6). 

Denote this alternative technique for obtaining estimation sets by S, with desired confidence coefti- 
cient «. S is defined by 


ifreS(@) and s¢S(A), then p,(0)>>p,(0), (4:1) 
P(@|S) is minimized for each 0 subject to the above restrictions. (4-2) 


This is analogous to Armsen’s (1955) treatment of the 2 x 2 contingency table. 
Desired properties of S are 


@(r, S) to be defined and not void for all r, (4:3) 
a(S) =a, (4-4) 
@(r, 8S) to be a single interval for all r. (4:5) 


It is found that these estimation sets do not always possess (4:5). For example, if n = 3, a = 0-4444, 


(0, S) = [(0 — 0-330) (0-337 — 0-366)], 
i.e. a pair of intervals. 

It seems, however, that for reasonably large n (> 10) this situation does not readily occur. Although 
the above estimation interval may not perturb a mathematical statistician it is liable to worry the 
general worker trying to analyse an experiment. 

The situation can be avoided by putting a prior restriction on S(@). 


Ifr<S(#) and @&’>06, then r<S(6’); (4:62) 
ifr>S(0) and 6’ <0, then r>S(0’). (4:66) 


The desired property follows immediately from (4:6a,6) and the fact that S’(@) is a consecutive set of 
integers. 


Let S’ be the technique defined by (4°64, b), (4:1) and (4-2). Then for n = 3, « = 0:4444, we have 
0(0, S’) = [(0— 0-366]. 


In practice it seems that the restriction (4-6 a, b) isnot needed for the usual values of « = 0-9500, 0-9000. 

S, S’ both possess the property (4:3) since S(0), 8’(@) are not void for any 0. They also possess (4:4) 
for all n provided «> 4. If n>1 then they also have the property for some values of a < }. 

Possession of (4:4) may be readily shown for S’. Note that S’(9) is a consecutive set of integers for 
all 0, therefore both 0(0,S’), @(r, S’) are intervals. The latter may therefore be replaced by 


[Ao(7, S’), A(r, S’)). 


The former, for 0 = 0, S’(@) = (0), is then [0; 0,(1,S’)] the upper bound not included. P(@ |S’) >a as 
6 +6,(1, 8’). 





For 


and so 


Also fe 

Rai 
abiliti« 
at leas 
desire 


exam 
distri 
mode 


and 


5b) 


4) 


or 





Miscellanea 279 


For n = 10, a = 0-9545, it was found that for all 6: 
S(0)c R(A), (47a) 
and so for all r: O(r, S)c O(r, R), (4-76) 
a(S) = 0-9545, a(R) = 0-9635. 


Also for 9c (0 — 0-25, 0-75 — 1); P(@ | R)>0-9773. 

R and S, S’ are difficult to compare for general n, « since R is defined in terms of cumulative prob- 
abilities and S in terms of individual probabilities, but it seems that the inequalities (4-7 a,b) hold for 
at least ‘most’ n. When they do hold then S, S’ may be said to be more selective than R, for the same 
desired confidence coefficient. 


5. SUMMARY 


Confidence coefficient: the confidence coefficient for R is ‘almost always’ in excess of the desired con- 
fidence coefficient, whereas for S, S’ they are the same. 

Selectivity: in the cases calculated it has been found that S, S’ are more selective than R. 

These two features are linked to a certain extent. However, if S, S’ did not embody restriction (4-1) 
the desired confidence coefficient could have been equal to the actual confidence coefficient without any 
increase in selectivity. 

These results are consequences of the fact that S, S’ do not include any restriction about distribution 
of probabilities over the sample region not included in the various average sets. The usual application 
of R has P, = P,, i.e. has its coverage set placed as centrally, in probability, as possible. 

This centralization, though intuitively plausible for a set of distributions which are strictly unimodal 
and symmetric for all 0, loses much of its intuitive appeal when the distributions are not, in general, 
symmetric. 

The alternative system of setting up estimation intervals may be applied to estimation for para- 
meters of other discrete distributions. However, the appropriateness of the system seems on a cursory 
examination to be linked with the fact that, in some cases including the binomial, the probability 
distribution having maximum likelihood for a particular observation also has that observation for its 
mode. 


REFERENCES 


ARMSEN, P. (1955). Tables of significance tests of 2 x 2 contingency tables. Biometrika, 42, 494-9. 

BaRNARD, G. A. (1947). Significance tests for 2 x 2 table. Biometrika, 34, 123-8. 

Cropper, C. J. & Pearson, E. 8. (1934). The use of confidence or fiducial interval illustrated in the 
case of the binomial. Biometrika, 26, 404-13. 

Pearson, E. S. & Hartiey, H. O. (1954). Biometrika Tables for Statisticians. Cambridge University 
Press. 

Stevens, W. L. (1950). Fiducial limits for the parameter of a discontinuous distribution. Biometrika, 
37, 117-21. 

TwEEpDIE, M. C. K. (1955). On Parameter Estimates Expressed as Sets of Values. Unpublished Mimeo. 
Virginia Polytechnic Institute. 


Further critical values for the sum of two variances 


By A. HUITSON 
Ferranti Ltd, Lily Hill, Bracknell, Berks. 


In a previous paper, Huitson (1955), the author has considered the problem of assigning confidence 


m 
limits, calculable from the observations, to functions of the form ) A,;o?, where A, is an arbitrary con- 
i=1 

stant and a? is an unknown variance for which we have an estimate s} based on f,; degrees of freedom, 
where f;s/o? is distributed as y*. In particular, tables of the upper and lower 5 and 1 % critical values 
of (A, si +A,83)/(A, o3 +A, 0%), where A, and A, are positive constants, were given. Tables of the upper 
and lower 24 and }% critical values are now presented. 

The tables were calculated from the series solution obtained in the earlier paper for the general case. 
Using the criterion that the series solution gives two-decimal accuracy, if the last calculated term is 
less than 0-005, the tabled values will be correct, except possibly those for which one of the degrees of 





280 Miscellanea 


Table 1. Lower 24 % critical values of (Ay 8 +Az_s3)/(A,o3 +A, 03) 














a 
A, 8? 
“1% _ | 00 of O2 03 04 O85 06 07 O08 09 146 
Ay si +A 82 
i Se 
16 16 0-43 0-46 O48 052 057 0-59 0-57 052 048 0-46 0:43 
36 0-59 O62 064 0-66 0-65 0-61 0-56 0-51 0-48 0-46 0-43 
144 0-78 O79 O78 O72 O65 059 055 £4051 0-48 0-46 0-43 


fo 0 100 089 O79 O71 065 060 055 052 0:49 0-46 0-43 














36 16 | 0-43 0-46 0-48 0-51 0:56 0-61 0-65 0:66 064 0-62 0-59 | | @ 
36 | 0-59 0-62 0-64 0:67 0:69 0-70 0-69 0-67 0-64 0-62 0:59 | | 
144 | 0-78 0-80 0-81 0:80 0-77 0-74 0-70 0:67 0-64 0:62 0-59 | 
| © 100 0-04 088 0-83 0-78 O74 0-71 0-67 0-64 0-62 059 | 
Re: Sees. e : Bs x a | AS 
144 16 | 0-43 0-46 0-48 0-51 0-55 0-59 0-65 0-72 0-78 0-79 0-78 | 1 


| 36 0-59 0-62 0-64 067 0-70 0-74 O77 O80 0-81 0:80 0-78 | 
| 144 0-78 O80 082 083 0-84 0-84 084 083 0-82 0-80 0-78 | 
| oo | 100 0-97 095 092 090 0-88 086 084 0-82 0-80 0-78 








Pico a 16 0-43 046 0-49 0-52 055 060 065 0-71 0-79 0-89 1-00 


( 
36 | 0-59 0-62 0-64 0-67 0-71 0:74 O78 O83 O88 0-94 1-00 
144 0-78 0-80 0-82 0-84 0-86 0-88 0-90 0-92 0:95 0-97 1-00 
0 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1:00 —_—:1-00 
Table 2. Upper 24% critical values of (A, 83 +Aq83)/(A, 73 + A, 02) 
Asi 
4 0-0 0-1 0-2 0-3 0-4 0:5 0-6 0-7 0-8 0-9 1-0 
Ay sitAgse | 
ti Se | 
16 16 1-80 1-69 1-62 1-57 1-54 1-53 1-54 1-57 1-62 1-69 1-80 
36 1-51 1-45 1-42 1-4] 1-42 1-44 1-47 1-52 1-58 1-68 1-80 
144 1-24 1-23 1-24 1-26 1-30 1-34 1-40 1-47 1-56 1-67 1-80 
co 1-00 1-05 1-10 1-15 1-21 1-28 1-36 1-45 1-55 1-67 1-80 




















36 16 1:80 168 1:58 1:52 147 1:44 142 141 1-42 1-45 1-51 
36 151 1:45 140 1-37 1:35 1:35 1-35 1:37 1:40 1-45 1-51 
| 144 1:24 1:22 1:22 1:22 1-24 1:26 129 1-838 1-38 1:44 1-51 
| o | 100 1:04 1:07 111 1-16 1:20 1-25 131 1-37 1-44 161 
| 
| | } = 
ie SE 7 
| 144 | 16 | 180 167 156 147 140 1:34 130 1:26 1:24 1:23 1-24 | 
| 36 | 151 1:44 138 1-38 1-29 1:26 1-24 1-22 1-22 1-22 1-24 
| 144 | 1:24 1-22 120 21-18 2117 1:17 117 1:18 1:20 1-22 1-24 | 
oo | 100 1:02 1:04 1:06 1:09 11 1:13 116 1:19 1-21 1-24 | 
adcnsthendabh - ; 
| 16 1:80 1:67 4155 145 1:36 1:28 1:21 1:15 21:10 1:05 ~~ 1-00 
36 151 1:44 1:37 1:31 1:25 1-20 1:16 1-11 1:07 1:04 1-00 
144 1:24 1-21 41:19 41:16 4113 1-11 1:09 1:06 1:04 1:02 1-00 
| co | 100 100 100 1:00 1:00 1:00 1:00 1:00 1-00 1-00 my } 
} 
A. 

















ee 


_ Pe 










































































Miscellanea 281 
Table 3. Lower 4% critical values of (Ay8i + Ags?) /(Ayoi +A, o3) 
Asi | 00 Of1 O02 03 O04 OF 06 O7 O8 09 140 
Asi +AQ83 | 
| 
| | iw sein 
fi -| Ss 
16 | 16 | 0:33 0:34 0:36 0-41 0:46 0:49 046 0-41 0:36 0:34 0-33 
36 | 0:50 0:52 0:55 0:58 0:56 0:50 0-43 0:39 0:36 0:35 0-33 
144 | 0-72 0-74 O71 0-62 053 0:46 0:42 0:39 0:37 0:35 6-33 
0 1:00 0-84 0-70 0-60 052 0:47 0-43 0-40 0:37 0-35 0-33 
36 16 | 0:33 0:35 0:36 0:39 0:43 0:50 056 0-58 0:55 0:52 0-50 
36 | 0:50 0-52 055 0:58 0-61 0:63 0-61 058 0:55 0:52 0-50 
144 | 0-72 0-74 0-75 0-74 0-70 0:66 062 058 0:55 0-52 0-50 | 
00 1:00 0-91 0-83 0-77 0-71 0-66 0:62 058 0-55 0-52 0:50 | 
| 
- Re idle Nelo el. na RoE Ra fet a 
144 16 | 0-33 0-35 0-37 0-39 0-42 0-46 0-53 0-62 0-71 0-74 0-72 | 
| 36 | 050 0-52 0-55 0-58 0-62 0:66 0-70 0-74 0-75 0-74 0-72 | 
144 | 0-72 0-74 0-76 0-78 0-79 0-80 0-79 0-78 0-76 0-74 0-72 
0 1:00 0-96 0:93 0:90 0-87 0:84 081 0-79 0-76 0-74 0-72 
| can | 
| 0 16 | 0:33 0-35 0:37 0-40 0-43 0-47 (0-52 :0-600-70 0841-00 | 
36 | 050 0:52 0:55 0:58 0:62 0-66 0-71 0-77 0-83 0-91 1-00 | 
144 | 0:72 0-74 0:76 0-79 O81 0:84 0:87 0:90 0:93 0-96 1-00 | 
00 1:00 1:00 1-00 1-00 1-00 1:00 1-00 1-00 1-00 1-00 1-00 | 
| 
Table 4. Upper }% critical values of (Ay 8} +Aqs3)/(Ay 71 + Ago?) 
A, 8% 
— 0-0 O11 02 O38 O04 O8F O06 O7 O8 09 1-0 
Ay Si t+Az 83 
: ite facts chlel etl * 
A ee 
16 16-214 1-96) 1-85: 1:76 174 41:76 180 1:85 1:96 214 
36 | 1-71 163 158 1:57 1:58 1-61 165 1-71 1:80 1:94 214 | 
144 | 133 1:31 1:32 1:36 140 1:45 1-52 162 1-76 1:93 214 
| o | 1:00 106 1:12 119 1:26 1:35 1:46 1-58 174 192 214 
| : ‘alk at 
36 | 16 | 214 1-94 80 1-71 1-65 161 1:58 4157 158 1-63 1-71 
| 35 | 71 1:61 1:55 1-51 1:48 148 148 151 155 1:61 1-71 
| 144 | 1:33 130 1:29 130 1:32 1:35 1:39 1-44 151 1:60 1-71 
| co | 100 104 1:09 114 1:20 1-26 1:33) 141 150 160 1-71 
| 
——|—_| erred 
144 16 | 214 1:93 176 162 1:52 145 1-40 1:36 1:32 131 1-33 
| 36 | 171 1-60 1-51 1-44 (1-39-1135 :1:32) 1:30 1-29 1-30 1-33 
| 144 | 1-33 1:29 1-26 1:24 1:23 1:23 1-23 124 1:26 1-29 1-33 
00 1:00 103 1:05 108 %ll 114 1-17 1-21 1:25 1:29 1-33 
0 16 | 214 1:92 1-74 158 1:46 4135 1:26 119 112 1:06 1-00 
36 | 1-71 160 1:50 1-41 1:33 1:26 1:20 1:14 1:09 104 1-00 
144 | 133 1:29 1:25 1-21 4117 41-14 ll 108 1:05 103 1-00 
0 100 100 100 100 100 100 1:00 1:00 1:00 1-00 1-00 

















282 Miscellanea 


freedom is 16, which may be one or two units in error in the second decimal place. For intermediate 
values of f, it is necessary to interpolate with respect to 12/,/f. 
For an example of the use of the tables, reference may be made to the earlier paper. 


REFERENCES 


Huitson, A. (1955). A method of assigning confidence limits to linear combinations of variances, 
Biometrika, 42, 471-9. 


Studies in the history of probability and statistics. 
VIII. De Morgan and the statistical study of literary style 


By R. D. LORD 
Royal College of Science and Technology, Glasgow 


C. B. Williams has recently given an account of two little-known papers published by T. C. Mendenhall 
in 1887 and 1901 on the statistical analysis of literary style, but was unable to trace Mendenhall’s 
reference to de Morgan’s suggestion that one might identify an author by the average length of his 
words. In his first paper Mendenhall said that he saw the suggestion 5 or 6 years earlier in a book by 
de Morgan, possibly his Budget of Paradoxes. I think it fairly certain that Mendenhall’s memory was 
at fault, that the book was in fact Memoir of Augustus de Morgan by his wife Sophia, published in 1882, 
and that the suggestion occurs in a letter of 1851 to an old Cambridge friend, the Rev. W. Heald. The 
rest of the book has no hint that de Morgan ever followed up his idea. Had he done so he would probably 
have found, as did Mendenhall, Yule and Williams, that word-iength is an unsatisfactory criterion 
compared with sentence-length. The letter can be allowed to speak for itself. 


7 Camden Street, 
Aug. 18, 1851 
Dear Heald, 

It has become quite the regular thing for the depth of vacation to remind me—not of you, for anything 
that carries my thoughts back to Cambridge does that,—but of inquiring how you are getting on, of which 
please write speedy word, according to custom, once a year.... 

* * * * 


I wish you would do this: run your eye over any part of those of St. Paul’s Epistles which begin with [avdos 
—the Greek, I mean—and without paying any attention to the meaning. Then do the same with the Epistle 
to the Hebrews, and try to balance in your own mind the question whether the latter does not deal in longer 
words than the former. It has always run in my head that a little expenditure of money would settle questions 
of authorship in this way. The best mode of explaining what I would try will be to put down the results I 
should expect as if I had tried them. 

Count a large number of words in Herodotus—say all the first book—and count all the letters; divide the 
second numbers by the first, giving the average number of letters to a word in that book. 

Do the same with the second book. I should expect a very close approximation. If Book I. gave 5-624 letters 
per word, it would not surprise me if Book II. gave 5-619. I judge by other things. 

But I should not wonder if the same result applied to two books of Thucydides gave, say 5-713 and 5-728. 
That is to say, I should expect the slight differences between one writer and another to be well maintained 
against each other, and very well agreeing with themselves. If this fact were established there, if St. Paul’s 
Epistles which begin with IlavAos gave 5-428 and the Hebrews gave 5-516, for instance, I should feel quite 
sure that the Greek of the Hebrews (passing no verdict on whether Paul wrote in Hebrew and another translated) 
was not from the pen of Paul. 

If scholars knew the law of averages as well as mathematicians, it would be easy to raise a few hundred 
pounds to try this experiment on a grand scale. I would have Greek, Latin, and English tried, and I should 
expect to find that one man writing on two different subjects agrees more nearly with himself than two different 
men writing on the same subject. Some of these days spurious writings will be detected by this test. Mind, 
U told you so. With kind regards to all your family, I remain, dear Heald, 

Yours sincerely, 
A. De Morgan. 
REFERENCES 


Wuu1aMs, C. B. (1956). Studies in the history of probability and statistics. IV. A note on an early 
statistical study of literary style. Biometrika, 43, 248-56. 

SopHIA DE MorGan (1882). Memoir of Augustus de Morgan by his Wife Sophia Elizabeth de Morgan 
with Selections from his Letters. London: Longman, Green and Co, 





Dict 


ate 


2e8, 





[ 283 ] 


REVIEWS 


Dictionary of Statistical Terms. By M. G. Kenpatt and W. R. Buckuanp. Edin- 
burgh: Oliver and Boyd Ltd., for the International Statistical Institute with the 
assistance of UNESCO. 1957. Pp. ix+493. 25s. 


This book constitutes a worthy and useful contribution to the consolidation of statistical terminology. 
It became necessary as a result of the vast recent development of statistical theories and their applica- 
tion. Great difficulties are involved in such a task since many authors create their own terminology 
and insist on its use. The book contains about 1600 terms with clear definitions. Numerous cross- 
references increase the importance of the information. 

Instead of attempting to construct a ‘best’ terminology, the authors reproduce mainly the existing 
one. This is bound to lead to certain difficulties. The probability F(x) of a value up to = is called the 
distribution function while probability distribution, frequency distribution, frequency function, 
stand for the derivative, i.e. the density function. Certain terms are rightly characterized as obsolete 
and others as equivalent to better ones. 

The forrnulae included are very helpful for the understanding. In most cases, the original author 
and the year are stated. The lack of further bibliographic reference is regrettable for contemporary 
authors and still more for the history of statistics. A statement (35) such as ‘Carli (1764)’ is not very 
helpful without any indication where such an article can be found. 

The book covers mainly mathematical statistics, but, in addition, it contains some terms used in 
quality control, economic, technical, and physical statistics. This part could be increased. The restraint 
in including population statistics is reasonable since the United Nations are bringing out a Multilingual 
Demographic Dictionary. 

The second part gives the statistical terms used in the French, German, Italian and Spanish lan- 
guages in alphabetical arrangement with English translations and references to the pages where the 
English definitions are given. The necessity of including Spanish may be doubted. Since many more 
contributions have been made by Russian than by Spanish authors, a future edition should also contain 
a Russian dictionary in Cyrillic and Latin letters. The inverse problem, namely, German, French, 
and Italian translations of the English terms, is partially solved. An English-German dictionary exists, 
and a French translation is on the way. 

The authors should be congratulated for their patience and achievements. They have made a great 


contribution to international scientific co-operation. E. J. GUMBEL 


Statistical Analysis of Stationary Time Series. By U. GRENANDER and M. RoseEn- 
BLATT. New York: John Wiley and Sons Inc., London: Chapman and Hall, Ltd. 
Stockholm: Almquist and Wiksell. 1957. Pp. 300. 88s. 


The authors of this work have done statisticians a service in bringing together for the first time the 
diversity of applications of time-series analysis in the physical sciences, especially in electrical engin- 
eering. Their approach is concerned almost entirely with the spectrum of a stationary process and is 
significant of the shift in emphasis which has recently taken research away from empirical models and 
the auto-correlation function to direct estimation of the spectrum. 

However, the present writer is unable to share the authors’ belief that in addition to being useful 
to specialists, this book will also serve the needs of research workers in these fields. This is due to the 
fact that the development is very formal, drawing heavily on the language of measure theory and the 
methods of the theory of functions. Whilst the former is hardly necessary (although fashionable !) 
the latter is extremely important and is used with elegance in this book. However, the applied worker 
will find it very difficult to assimilate page after page of analysis in order to glean the essential features. 

Chapter 1, dealing with the fundamental properties of stationary processes is one of the best in the 
book and contains a series of very interesting electrical applications. Chapter 2 is the most difficult 
to read and is concerned with least squares problems when the spectrum is known. Chapter 3 con- 
stitutes a rather unsuccessful attempt at summarizing the auto-correlation approach to time-series 
using finite parameter models. Here again, mathematics rules the day, e.g. a very elaborate proof 
of the derivation of Fisher’s ‘g’ distribution is given without even a mention of the more practical 








284 Reviews 


simplifications which are available, such as the approach to harmonic analysis using the analysis of 
variance which has been given by Hartley. This chapter is concluded by a discussion of the limitations 
inherent in this approach to time-series; these are reasonable and the writer has given a more detailed 
account of this elsewhere (see Jenkins and Priestley, J.R. Statist. Soc. 19). They also remark that owing 
to non-stationarity, spectral methods may not be applicable directly in economics and related sciences, 
This is difficult to understand since anyone with experience in estimating noise and turbulence 
spectra will testify that these complications arise here in the same annoying manner. 

Chapters 4 and 6 contain the authors’ fundamental contributions to the asymptotic distribution 
theory of spectral estimates and the evaluation of their biases and variances. It is gratifying to observe 
that the uncertainty principle is not mentioned, but is apparently replaced by a mean-square error 
criterion for comparing various estimators, the latter being the sum of the variance and the bias rather 
than the product as considered in the uncertainty principle. Chapter 5 is concerned essentially with 
the applied mathematics of noise, ocean-wave and turbulence spectra. Some attention is given to 
analogue methods for estimating the spectral density, but it is surprising that no mention is made of 
a method which corresponds exactly to the digital approach. This is to divide tho series into a number 
of sections and then average harmonic analyses conducted on each subsection. The writer knows of 
at least one instrument based on this principle. 

Chapter 7 deals with regression problems when the residuals are stationary time-series. The inter- 
esting result is proved that in certain regression problems (including the important cases of trigo- 
nometric and polynomial regressions) the estimates of the coefficients are asymptotically fully efficient 
when the residuals are assumed to be independent (i.e. by ignoring their variance-covariance matrix). 
In practice, the problem of mixed spectra is that the separation of signal and noise is seriously affected 
by the size of the sample, but no account is given of any work dealing with this problem. Chapter 8 
deals with a few assorted problems and this is followed by a series of exercises dealing with each 
chapter; there is also an appendix on complex variable theory. 

Lack of space prevents discussion of further points of detail save to mention that the authors are 
unconvincing on the question of applications. By the latter is meant the empirical analysis of observed 
series and not the properties of stationary models used in electrical engineering, etc. In fact, we are 
not given an example of a practical auto-correlation function, or how the authors would deal with the 
very troublesome points of non-stationarity and the importance of the truncation point on spectrum 
estimates. 

One does not expect an answer to all the problems, especially as the authors have given many of 
the answers already, but it is reasonable to ask that the difficulties should be brought to the fore and 
not concealed in a maze of mathematics. There is no doubt, however, that this work is a welcome 
addition to the limited number of books on time-series and it is to be hoped that it will be widely read. 


G. M. JENKINS 


Psychological Tests and Personnel Decisions. By L. T. CronBacnu and G. C. GLESER. 
Illinois: University of Illinois Press. 1957. Pp. 165. $3.50. 


The traditional theory of mental testing, as Dr Cronbach and Dr Gleser point out, has for the most 
part been based on the principle (stated most clearly perhaps by Clark Hull) that ‘the ultimate purpose 
of using aptitude tests is to estimate or forecast aptitudes from test scores’. From Kelly to Gulliksen 
the prime criterion for judging a psychological test has been precision of measurement. Correlation 
coefficients in the form of ‘reliability’ and ‘validity’ coefficients have commonly been used to deter- 
mine how far ‘uncertainty’ is reduced—uncertainty being assessed by the mean square error of the 
quantity to be measured. 

Dr Cronbach and Dr Gleser ‘propose to abandon’ this traditional point of view: for them ‘the 
ultimate purpose of personnel testing is to arrive at qualitative decisions’. And they insist that ‘con- 
siderations of cost and consequences of decisions must be taken into account in every statistical in- 
vestigation’ that deals with such problems. Their general method is based on the principles of ‘decision 
theory’ as developed by Abraham Wald; and their object is to show how the principles that he has 
elaborated for use in the economic and industrial field can be applied to psychological and educational 
testing. 

Owing to the heavier pressure of costs, British firms, education authorities, and government depart- 
ments have been far less ready than American to adopt large-scale schemes for psychological testing. 
Thus, when such methods were first proposed in this country for the purpose of selecting pupils for 
‘special and secondary schools’, it was necessary for the educational psychologist to demonstrate to 





ms 


anpoaeenm © 


- 





ER. 





Reviews 285 


those who employed him that the benefits to be anticipated from the new scientific procedures would 
more than counterbalance their cost. In some of the earliest reports on the problem, use was made of 
actuarial principles in the arguments advanced. The calculations, though rough and ready, were in 
fact a forerunner of the more precise and detailed procedures advocated in Psychological Tests and 
Personnel Decisions. 

Let 1, denote the cost of conducting the annual scholarship examination in accordance with the old 
procedure (two printed papers, set and marked by a salaried board of examiners).* Ifthe validity of 
the examination were zero, 1, would measure in monetary equivalents the ‘loss’ incurred by its adop- 
tion. If the validity were perfect, we may take g, to denote the resulting benefit or ‘gain’. Let p, be 
the proportion of this gain which, we anticipate, will result from such imperfect validity as the pro- 
cedure is found to possess: it may be conveniently calculated as an index-figure, ranging from 0 to 1, 
and expressing the probability that the procedure will attain complete success in every case. Then the 
‘mathematical expectation’ will be 

x, (say) = ~19,—(1—p,) 
= P(9.-4,)—h- 
With a similar notation for the gain, loss, and weighting to be expected from the new policy proposed 
(e.g. adding a printed group test of intelligence for all candidates and an interview for borderline cases), 
we may write (for this second alternative) 
®_ = (say) P2(J2—1,) —l,. 

With varying borderlines and varying groups of candidates, we shall achieve different degrees of 
success with each of the two procedures. Briefly the deciding principle proposed was to examine the 
minimum of the two mathematical expectations deducible for all possible cases, and to recommend 
that particular procedure which would maximize the minimum mathematical expectation. 

Much the same type of argument was later adopted at the National Institute of Psychology in 
suggesting methods of vocational selection to industrial firms, and in recommending methods for 
allocating army recruits to various training courses during the war. In each case the deciding factor 
was not the absolute ‘validity’ of the procedure proposed, but the additional information to be expected 
as weighted against the additional cost. 

At first sight this approach might seem almost the opposite of that adopted by Wald. Wald has 
pointed out that such statistical decisions may be compared to ‘zero-sum two-person games’ where 
the decision-maker is, as it were, playing against an obstinate and somewhat blind player whom he 
designates ‘Nature’; and he suggests that the safest policy will be, not to maximize the minimum gain, 
but to minimize the maximum loss. As Cronbach and Gleser observe, this procedure is tantamount to 
assuming that ‘if anything can go wrong, it will’: ‘so pessimistic a view’, they add, ‘is doubtfully 
appropriate in statistical decisions, since Nature is presumably indifferent rather than antagonistic’. 
A conservative principle may be the most prudent in solving the problems that confront the economist 
or manufacturer; but for the practical psychologist a more sanguine line of approach might be 
permissible. 

Complex problems in ‘selection’ confront the psychologists in many different fields; and recently 
the old controversy of selection for secondary school has cropped up again in this country in con- 
nexion with the ‘11 plus examination’. However, if one may judge by the British Psychological 
Society’s recent Report (P. E. Vernon et al., Selection for Secondary Schools, 1957), neither psychologists 
nor the educationists have as yet attempted to examine the practical issues from the standpoint of 
decision theory. It is the urgency of the problem and the gravity of this omission that makes the 
publication of Psychological Tests and Personnel Decisions so opportune for British psychologists at 
the present time. The authors themselves quote Prof. Vernon’s estimate of the validity of the ‘best 
test battery’, namely, 0-85; and write: ‘the correlation of 0-85 between prediction and grammar 
school success in England is far less significant than it seems at first glance. From the viewpoint of 
national policy the aim is to maximize total output. The present scheme does this only if the slope 
0,1 ye is greater for the grammar schools than for the modern secondary schools. The facts for deter- 
mining this are not available, since it has been thought that predictive validity alone is required.’ 

The mathematical procedures which Dr Cronbach and Dr Gleser propose may be regarded as an 
amplification of the classical method of assessing ‘mathematical expectations’ referred to above. 
They admit that their ‘mathematics is involved and laborious’. But, as they observe, this is the price 
paid for bringing in the various parameters required for a rigorous description: where this price is too 
high, the tester can ‘obtain approximate answers by using simplifying assumptions’. 


* Memorandum by the Psychologist to the London County Council on The Use of Intelligence Testa 
in Junior County Scholarship Examinations for Free Places in Secondary Schools (1917). 








286 Reviews 


With the formulation they put forward, the writers have no difficulty in showing that, even when 
the simplest conditions are postulated, there can still be ‘no simple answer to the query ‘‘ How valuable 
for selection decisions is a test with validity r?”’’ Instead they present a number of three-dimensional 
graphs, derived from their formulae and showing the intricate relations between (i) relative cost, 
(ii) the selection ratio (or cut-off score), and (iii) the validity of the test. They are thus able to provide 
a useful basis for determining the most appropriate strategies for particular cases. 

Having discussed in their opening chapter the general characteristics of ‘decision problems’ and the 
types of ‘personnel decisions’ that most commonly arise in psychological testing, Dr Cronbach and 
Dr Gleser carry out a detailed examination of such problems as the optimum selection ratio, the 
optimum length of a single test, the optimum size of a battery of tests, and the ‘distribution of effort’ 
in sequential testing. A special chapter is devoted to the neglected question of the ‘band-width- 
fidelity dilemma’, i.e. the difficulties raised by the fact that when the examiner or interviewer attempts 
to cover a wider range of clues or characteristics the accuracy of the information secured on each tends 
to diminish. As they point out, the most profitable types of test are those which, like the personal 
interview, can, when required, cover the widest range of clues or characteristics. 

After two shorter chapters on decisions with fixed and adaptive treatments, and on the general 
evaluation of outcomes, the book concludes with a chapter on the ‘assumptions implicit in the theory 
of psychological testing’, illuminated by the fresh light thrown on the problems of testing by the newer 
techniques of decision theory. ‘The test theory developed to date’, they contend, ‘covers only a small 
corner of the domain within which the decision-maker operates....Wherever we turn, we find un- 
answered questions, and research of many types is needed, from simple fact-finding to major in- 


® ’ 
ventions. CHARLOTTE BANKS 


Regression Analysis of Production Costs and Factory Operation. By P. Lyte. 
(Third edition, revised by L. H. C. Trpperr.) Edinburgh: Oliver and Boyd Ltd. 
1957. Pp. xiii+ 204. 16s. 


This well-produced book contains a wealth of material much of which will appeal to a professional 
statistician as well as to the industrialist with a moderate mathematical equipment who would like 
to learn something of statistics. Essentially the book is a record of the approach, albeit a somewhat 
groping approach at times, by which one erudite industrialist learned to apply statistical methods in 
his business. The illustrations have an authentic ring about them and are drawn from the author’s 
practical experience of the sugar-refining industry. This, the third edition, has been revised by 
Mr L. H. C. Tippett, but he has confined his revision to a few corrections and some notes, scattered 
through the book, that are designed to improve the clarity of exposition or to bring the discussion up 
to date. 

The main portion of the book (some 120 pages) has a dua! purpose. From a statistical point of view 
there is a good treatment of regression analysis, linear and curvilinear with one independent variable, 
and multiple linear regression with two or more independent variables. In the course of this analysis 
many other statistical concepts are brought out, such as significance level, a working definition of 
probability, the estimation of standard deviation, meaning of fiducial limits and others. Uses of basic 
distributions such as the ¢t-distribution are also included. An industrialist would regard this portion 
as a valuable contribution to the study of technique of cost analysis. The book separates out the short- 
term from the long-term changes and brings into account price and wage levels and other economic 
factors. There is also a good, though short, chapter which discusses the marginal costs of production. 
A wealth of diagrams illustrate the chapters and enable anyone who does not wish to delve too deeply 
into the mathematical side of the subject to get a firm grasp of the principles involved. A minor 
nuisance is having to turn to the end of the book every time a diagram is mentioned in the text. 

The last sixty pages are a gold mine of information that has clearly been culled from numerous 
sources over a long period of time and brought together under the unassuming heading of Appendix I 
and Appendix II. No practical statistician could fail to find something of interest here. There is a very 
sensible discussion of how to set about finding the mathematical form of some unknown function con- 
necting two or three variables when only a series of simultaneous observed values of the variables is 
available. Another section gives a clear account of the use of nomograms for three variable problems. 
Mr Lyle subsequently took the subject further in two papers in Applied Statistics (1954). The second 
appendix consists of a discussion of the meaning of the two terms correlation and regression. 

The book is rounded off with a short bibliography, a summary of the equations used, a glossary and 


an index. The standarc of production is high. P. G. MOORE 





St 





Reviews 287 


Experimental Designs (second edition). By W. G. Cocnran and G. M. Cox. New 
York: John Wiley and Sons, Inc. London: Chapman and Hall Ltd. 1957. Pp. 611. 
82s. 


The first edition of this book was reviewed in Vol. 38 of this journal (1951, pp. 260-1). In that review 
it was remarked that the book should prove especially valuable to statisticians requiring a com- 
pendium of all the more useful designs. In the intervening period there has been considerable research 
activity in the field of experimental design. The authors are, in consequence, no longer able to give 
plans of all the more useful designs. As a compromise they have included an exhaustive index to those 
designs for which a plan is not provided. Two additional chapters have been inserted. One, on frac- 
tional replication, corrects a notable, and surprising, deficiency in the first edition. The other contains 
material on the study of reponse surfaces, based for the most part on the work of G. E. P. Box. This 
chapter is rather out of keeping with the rest of the book, as is perhaps to be expected of a late addition. 
Nevertheless, many statisticians will probably find this chapter, and particularly the attached plans, 
of considerable usefulness. 

It is evidence of the soundness of the first edition that very little of the original text has been 
altered or removed in the present edition. A number of additional paragraphs have been inserted, 
usually on points of detail omitted in the first edition. In particular, the inclusion of a section describing 
Yates’ well-known tabular method of analysing a 2” table may be mentioned. It is unfortunate 
that no space has been found for some material on the economic choice of amount of experimentation, 
a topic which has attracted some attention in recent years. 

The overall effect of the changes has been to consolidate the position of an already well-established 
text-book. The second edition is about one-third larger than the first, but the price has gone up by 
more than three-quarters. Even so, the book is still excellent value for the practising statistician. 


N. L. JOHNSON 


Statistické metody zemédélského a lesnického v¥zkumnictivi. [Statistical methods 
of agricultural and forestry research]. By VActav Mystivec. Praha: Ceskoslo- 
venskaé Akademie Zemédélskych Véd. 1957. Pp. 555. Kés 67. 


To the English reader, this book can scarcely do more than give an enouraging impression of progress 
in the use of statistical science in eastern Europe. The first one-fifth gives an elementary account of 
some of the standard problems in the statistics of small particles. Thereafter, the book conforms more 
to the pattern familiar in text-books in English, with an account of standard distributions and tests, 
analysis of variance and experimental design, and an outline of the mathematical theory. Forty pages 
of statistical tables are included. If a general impression can be trusted, the book is very good; the 
typography and printing are of a standard seldom attained for text-books in any country. 


D. J. FINNEY 


An Introduction to Probability Theory and its Applications. Vol. 1 (second 
edition). By W. Fetter. New York: John Wiley and Sons Inc. London: Chapman 
and Hall Ltd. 1957. Pp. 461. 86s. 

This is the second edition of a book which was first issued in 1950. There are some alterations but not 

many. The major difference from the first edition appears to be the insertion of Chapter ITT, ‘Fluctua- 

tions in Coin Tossing and Random Walks’, where the random walk occurs a little earlier than originally. 

During the seven years since this book was issued it has rapidly established itself as a classic in pro- 

bability theory connected with the discrete variable. One notes, hopefully, that it is Vol. I. 


F. N. DAVID 


Vector Spaces and Matrices. By R. M. Turatt and L. Tornnerm. New York: John 
Wiley and Sons Inc. London: Chapman and Hall Ltd. 1957. Pp. 318. 54s. 
Although linear algebra is now an important part of every specialist course in mathematics or statistics, 


most text-books on the subject are too abstract for beginners. Books similar to Birkhoff and MacLane’s 
Survey of Modern Algebra, but dealing with linear algebra in greater detail, as this book does, are 








288 . Reviews 


therefore very welcome. In foregoing the rigidly axiomatic approach, the authors have chosen to lose 
a little generality, but the gain in clarity and persuasiveness which they achieve easily outweighs this 
loss. 

The vector theory in this book is mainly finite dimensional. Matrices express linear transformations, 
and their main properties are derived before determinants are introduced. Hermitian and unitary 
matrices are associated with a scalar product and the notion of an adjoint. The text is therefore an 
excellent basis for the study of representation theory and for the study of Hilbert space. The treatment 
of canonical forms, polynomial rings and the decomposition of algebras, is exceptionally clear. A con- 
cluding chapter deals with linear inequalities and applications to linear programming and matrix 
games; this most interesting chapter is all too short. 

The rigorous but unlaboured style of the book should appeal to the serious student. The authors 
have evidently given the greatest care to the details of presentation, and they have done this without 
spoiling the freshness of their exposition. It is a text which students can be recommended to study and 


to imitate. The layout is excellent and the book is singularly free from misprints. H. KESTELMAN 


Digital Computer Programming. By D. D. McCracken. New York: John Wiley and 
Sons Inc. London: Chapman and Hall Ltd. 1957. Pp. 253. 62s. 


The growth of electronic computers over the past ten years has produced a spate of books on the theory 
and practice of the subject of computing. Many of these books have been specifically devoted towards 
the technique of preparing problems for a machine, the technique commonly known as programming. 
Most accounts of programming are written round a particular machine and the present book is to be 
welcomed as a general account of the subject. The illustrations are based on a hypothetical machine 
called TYDAC which seems to be a hybrid sort of creature the characteristics of which are drawn from 
a number of well-known computers. The book assumes that the problem has already been reduced to 
one of numerical analysis and deals with the four stages; programming, coding the programme, 
checking the programme and the final production of the required solutions. 

On programming, the book starts with an explanation of why binary notation is used. It then gives 
the basic operations that the machine will carry out including the use of loops and subroutines. The 
advantages of floating decimal point methods are discussed and a warning given on the care necessary 
to keep the required number of significant figures. There is also a description of the small changes 
necessary with magnetic tape, and a good description of the technique of double precision arithmetic 
which adds greatly to the accuracy of calculations at the expense of storage facilities. 

Coding is dealt with somewhat more briefly. This is presumably because a large part of program- 
ming is common to all machines whilst coding is rather more specialized. However, a great deal of 
emphasis is laid on the necessity for checking the coded programme before it is actually put on the 
machine. Hardly any programme is produced without a single mistake the first time. Generally, 
computer time is rather more valuable than the time of a human programmer, and it is a sound 
economic proposition to test all parts of the programme before transferring to the machine. Of course 
it is also possible to build in checks into the actual programme so that results can be verified in stages. 

The last part of the book brings together various devices for improving the programme and discusses 
methods of arranging the programme so that the operating time is cut down to a minimum. 

Programming can never be taught without seeing the actual results on a machine, but this book 
provides a very readable and straightforward account of the whole process, and would put the reader 
in a good position to tackle the programming manual issued with any electronic computer. 


P. G. MOORE 


Individual Differences in Night-Vision Efficiency. [Medical Research Council 
Special Report Series, No. 294.] By M. H. Pmennz, F. H. C. Marriorr and E. F. 
O’Donerty. London: H.M.S.0. 1957. Pp. 83. 8s. 


This report arises out of the war-time need to devise tests for selecting individuals with especially 
good night vision. In all tests the subject is presented with a visual task to perform at a very low light 
intensity and given a score based on his performance. The difficulty is to know just what visual task 
to set. in order to increasing complexity the task may consist in: 

(1) The correct recognition of a very dim flash of light in an otherwise completely dark room. The 
lowest light intensity that can be reliably recognized in this way is called the threshold. 





(2) 
This 

(3) 
visué 
to w 
life, | 
but ° 


JRE 





Reviews 289 


(2) The ability to recognize simple patterns, e.g letters, correctly at very low levels of illumination. 
This test of visual acuity is used at higher light intensity by opticians in fitting spectacles. 

(3) The ability to perceive correctly a scene or picture containing many objects. Here many non- 
visual factors enter the game, for objects are identified partly by their appearance, partly by the degree 
to which they ‘make sense’ in their context. Tests of this kind approximate most closely to practical 
life, but are difficult to set and score. Threshold tests, on the other hand, are easy to set and score, 
but their practical relevance is not clear. 

The authors of this report have performed tests of each kind with great care on a selection of sub- 
jects and have demonstrated that the results of the three types of test are very strongly correlated, 
so that for practical purposes one may use the most convenient one. The relationship between the tests 
turned out to be a very simple one; the less sensitive individual always required k times as much light 
as the more sensitive one. It is just as though his eye contained a filter which transmitted only 1/k 
of the incident light, a finding which the authors embody in their ‘filter factor’ theory. 

These conclusions are supported by detailed experimental reports which may be of little interest to 
the non-specialist. However, statistical readers will be interested by the sections dealing with the 
basic physiology of vision in very dim light. Performance near the threshold is characterized by 
uncertain seeing, that is, only a fraction of the flashes given are reported by the subject. Naturally 
this fraction gets smaller as light intensity is reduced. The interesting thing is that this statistical 
fluctuation is shown to arise from the discontinuous nature of light combined with the remarkably 
high sensitivity of the eye. So few quanta are emitted in the flash tk at the inevitable variation in their 


number is quite sufficient to account for the variation in the subject’s response. D. R. WILKIE 


Life and Other Contingencies, Vol. 1. By P. F. Hooker and L. H. Lonetry-Coox. 
Cambridge: Published for the Institute and Faculty of Actuaries at the University 
Press. 1957. Pp. 256. 20s. 


The first volume of this work was reviewed in Vol. 42 (1955, p. 274) of this journal. In that review it 
was mentioned that this is ‘one of a series commissioned by the Institute of Actuaries and the Faculty 
of Actuaries to provide a course of reading suitable for the examinations conducted by these bodies’. 
The present volume should meet this requirement admirably, in that it sets out the points likely to be 
required of examination candidates clearly, and in a form which should aid the memory of the average 
student. 

There is necessarily rather greater need for skill in algebraic manipulative ability in this volume than 
in the first volume. The authors have, however, succeeded in keeping this more technical aspect of the 
subject within manageable proportions, and have produced a work from which a fairly clear idea of 
the subject can be obtained by reasonable application to study. They have avoided, as far as possible, 
the more controversial parts of their subject by openly basing their work on the assumption that the 
different modes of decrement act independently, and concentrating attention on detailed investigation 
of the consequences of this assumption. 

Interesting features of the book include a discussion of the method of uniform seniority applied to 
cases where 

(i) #g =A+Hx+Bc* (Makeham’s second modification of Gompertz’s law), 
or (ii) #4, =A+Bc*+Mn* (the ‘double geometric’ mortality law) ; 
three chapters on calculations connected with pensions and widows’ and orphans’ funds, and a chapter 
on disability benefits. There is also an addendum describing the International Actuarial Notation 
now Officially adopted by the Institute and Faculty. 

This is a book for the student and the specialist. There are worked examples at the end of each 
chapter, but the value of the book to a student would be enhanced by the provision of exercises and 
problems to be attempted. Nevertheless, for the somewhat restricted class of reader envisaged, the 


book should certainly prove a profitable purchase. N. L. JOHNSON 








[ 290 ] 


OTHER BOOKS RECEIVED 


Tables of Integrals and other Mathematical Data (third edition). By H. B. Dwicut. New York 
and London: The Macmillan Company. 1957. Pp. 288. 21s. 


Vector Analysis. By L. Brann. New York: John Wiley and Sons, Inc. London: Chapman and 
Hall Ltd. 1957. Pp. 282. 48s. 


Annual Epidemiological and Vital Statistics 1954. World Health Organization, Geneva, Switzer- 
land. [U.K. Sales Agent, H.M.S.O.] 1957. Pp. 617. 50s., $10 or Sw.fr. 30. 


Extension of existing Tables of Attributes Sampling Plans. Sandia Corporation Tech. Mem, 
146-56-51. By J. M. Wiresen. Washington, D.C.: U.S. Department of Commerce. 1956. Pp. 236. 
$1.25. 


Underground Corrosion. National Bureau of Standards Circular 579. By Metvin Romawnorr. 
Washington, D.C.: U.S. Department of Commerce. 1957. Pp. 227, 103 illustrations. $3.00. 


Man’s Journey through Time. By L. 8S. Patmer. London: Hutchinson and Co. (Pubs.) Ltd. 
1957. Pp. xv+184. 30s. 


Introduction to Statistical Inference. By J. C. R. Li. Michigan: Edwards Bros. Inc. 1957. 
Pp. 553. $7.50. 


Analysis of Multiple Time-Series. (No. 1 of Griffin Statistical Monographs and Courses.) By 
M. H. QuENouILLE. London: Charles Griffin and Co. Ltd. 1957. Pp. 105. 24s. 


Health and Medical Care in New York City. Report by Committee for the Special Research 
Project in the Health Insurance Plan of Greater New York. Cambridge, Mass: Published for the 
Commonwealth Fund by Harvard University Press. 1957. Pp. 275. 60s. 


Fractional Factorial Experiment Designs for Factors at Two Levels. Applied Mathematics 
Series 48. Washington, D.C.: U.S. Department of Commerce. 1957. Pp. 85. 50 cents. 


Concise Tables for Statisticians. By K.C.S. Pruart. Manila, Philippines: The Statistical Center, 
University of the Philippines. 1957. Pp. 50. $1.50. 


The Mathematical Theory of Epidemics. By N. T. J. Barney. London: Charles Griffin and Co. 
1957. Pp. 194. 36s.+ postage ls. 


Statistiska Metoder. Vols. 1 and 2. By H. Hyrenrus. Géteberg, Sweden: Gumperts. 1957. Pp. 625. 
Sw.kr.42. (£2. 18s. 0d.) 


Probability, an Intermediate Textbook. By M. T. L. Biztny. Cambridge. Published for the 
Institute and Faculty of Actuaries by the University Press. 1957. Pp. viii+230. 20s. 


Linear Algebra for Undergraduates. By D. C. Murpocnu. New York: John Wiley and Sons Inc. 
London: Chapman and Hall Ltd. 1957. Pp. 239. 44s. 


A Course in Multivariate Analysis. (No. 2 of Griffin Statistical Monographs and Courses.) By 
M. G. Kenpatu. London: Charles Griffin and Co. Ltd. 1957. Pp. 185. 22s. 


Bulletin Statistic Trimestrial. Roumania: Direction Centrale de Statistique. 1957. Parts 1 
and 2, Lei. 5.; Part 3, Lei. 3. 


Statistical Exercises, Part II. Analysis of Variance and Associated Techniques. By N. L. 
Jounson. London: Department of Statistics, University College London. 1957. Pp. 107. 12s. 


Games and Decisions. Introduction and Critical Survey. By R. Luce and H. Ratrra. New 
York: John Wiley and Sons Inc. London: Chapman and Hall Ltd. 1957. Pp. 509. 70s. 





Lar 
Mat 


the 


[ 291 ] 


CORRIGENDA 
York Biometrika (1954), 41, pp. 375-89 
and ‘Some further results in the theory of pedestrians and road traffic.’ 


By Avan J. MayNE 


/Zer- 


I am indebted to A. R. Bloemena, Research Fellow of the Statistical Department of the 
lem. Mathematisch Centrum, Amsterdam, for pointing out an error in the second equation of 





236. 
- the proof of Lemma 5 in this paper, on p. 383. This equation should read 
OFF. v+t 
Q,(05 1) = 710) | Q,alo—w+t; 0) dau). 
mi | Continuing the argument as in the original text, it is found that, writing 
N57. I'(s) =T(s; 0) = Lf{s, wu; G(u)}, 
tiv. 28 
By Ls, v; Oz; Vv; 0)} — s{1—2I'(s)]’ 
> Oe: v: a) = 2a # ) 
es I{s, v; Q(z; v; t)} = 3; ai-a'6N (48) (as corrected) 
- Ole: oy} = 1-2 AE -TO) 

es - Lis, v; OG; *)} = 5 — rari —ar(e)] ’ 

which is the correct version of equation (47) in Lemma 5. 
” When this correction is applied, it is found that equation (53) on p. 385, is still valid, so 

that no correction is needed for Theorems 4 and 5. 
Co. A.J.M. 
25. Biometrika (1956), 43, 433 
the =} ‘Confidence intervals for a proportion.’ 

By Epwin L. Crow 

1c, 

The displayed equation for R; should read: 
By n Wnld, , 

vail [32] r=1 7 ; 

















[ 292 ] 


TABLES OF SYMMETRIC FUNCTIONS—ERRATA 


By F. N. Davip ann M. G. KENDALL 


The following errors have been noticed in our tables of symmetric functions (Biometrika (1949), 
36, 431; (1951), 38, 435; (1953), 40, 427; (1955), 42, 223). 


Table 1. 
Table 2. 


Table 3. 


Table 4. 
Table 5. 


No errors noticed. 
2.9(i), p. 441. 


€ 12 
a. 1s 


_ 


v), p. 453. 


3.12(ii), p. 460. 
3.12(v), p. 462. 
No errors noticed. 
5.6, p. 224. 
5.9(i), p. 226. 


or 


. 10 (iii), p. 229. 
-11(v), p. 235. 
.12(i), p. 236. 
.12(v), p. 240. 
.12(v), p. 240. 
.12(v), p. 240. 


cr 


an oo 


Coefficient a,?a,°, (32) should be 64, not 61. 

Coefficient a,a,°, (32)? should be 51, not 50. 

Coefficient a, a,?a,?, (32)? should be 120, not 12. 

Coefficient a,?a,°, (72) should be — 11, not —1. 

The marginal item four lines from the top given as a,)@,, should be 
494,". 

Coefficient h,”, a,a,° should be — 2, not 2. 

Coefficient hghs?, a,a.'a, should be 4, not 5. 


Coefficient (2)*, ag should be — 15, not 15. 

Column headed a,a,a, two lines from right-hand margin should be 
headed a,a,a,°. 

Coefficient (2)? (1)5, a,4a, is correct but out of line. 

Coefficient (4) (3) (1), aga, should be — 30240, not — 30260. 

Coefficient of (3)* (1), a,a,? should be 134400, not zero. 

Coefficient of (5) (1)®, a,,a, should be 66528, not 665280. 

Coefficient of (7) (2)? (1), a,a.2a,5 should be 119, not 118. 

Coefficient of (4)? (1)*, a;2a,2 should be 29937600, not 27937600. 

Coefficient of (5) (2)? (1)8, a;2a, should be 19958400, not 1958400. 

Coefficient of (4) (1), aga,a,a, should be — 4989600, not — 9989600. 





Arit 





149), 


a 


_ be 





TRACTS FOR COMPUTERS 


Department of Statistics, University College, London 


I. Tables of the Digamma and Trigamma Functions. By ELEANOR PAIRMAN, M.A. 


‘ S. 1 
Tables for summing S= 2 (pit aq) Pal +a) Pal tan) 





where the p’s and q’s are numerical 
factors. Price 5s. net. 


Vv. Table of Coefficients of Everett’s Central-Difference Interpolation Formula. By A. J. 
THOMPSON, PH.D. Second edition. Price 7s. 6d. net. 


VIII. Table of the Logarithms of the Complete [-Function (to ten decimal places) for 


Argument 2 to 1200 beyond Legendre’s Range (Argument 1 to 2). By Econ S. PEARSON, 
D.Sc. Price 5s. net. 


IX. Log I (x) from x=1 to 50-9 by intervals of 0-01. By JoHN BRown zz, M.D., D.Sc. 
Price 5s. net. 


X. On Quadrature and Cubature or on Methods of Determining Approximately Single 
and Double Integrals. By J. O. Inwin, D.Sc. Price 7s. 6d. net. 


XII. Tables of the Probable Error of the Coefficient of Correlation. By Kart HoLzincer, 
Pu.D. Price 5s. net. 


XIII. Bibliotheca Tabularum Mathematicarum, being a Descriptive Catalogue of Mathematical 
Tables. Part I. A, Logarithms of Numbers. By James HENDERSON, PH.D. Price 9s. net. 


XV. Random Sampling Numbers. By L. H. C. Trepert, M.Sc., with a Foreword by Kari 
PEARSON. Price 5s. net. 


XXIII. Tables of tan-!x and log(1+x*). To assist in the calculation of the ordinates of a Pearson 
Type IV curve. By L. J. Comrig, Pu.D. Price 5s. net. 


XXIV. Random Sampling Numbers (2nd Series). By M. G. KENDALL and B, BABINGTON SMITH. 
Price 5s. net. 


XXV. Random Normal Deviates. By HERMAN WOLD. Price 5s. net. 


XXVI. Correlated Random Normal Deviates. By E. C. FieLter, T. Lewis and E. S. PEARSON. 
Price 10s. 6d. net. 


Nos. IT, If, IV, VI and VII are out of print 


> 


LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’s 
Arithmetica Logarithmica). 

The nine separate sections of this Table have now been issued, and the complete work 


consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 
General Introduction (98 pp.) is now available in two bound volumes. 


Price £8. 8s. ad. 





—> 





Issued by the CAMBRIDGE UNIVERSITY PRESS, Bentley House, LONDON, N.W.1 
on behalf of the 
DEPARTMENT OF STATISTICS, UNIVERSITY COLLEGE, LONDON 
and obtainable from any bookseller 





(All rights reserved) 
BIOMETRIKA. Vol. 45, Parts 1 and 2 
CONTENTS 


P ArmitTaGE. Numerical studies in the sequential estimation of a binomial parameter 
™-P. H. Lzestiz.. A stochastic model for studying the properties of certain biological systems by numeri- 
cal methods 
J. G. Skex~tam. On the derivation ond applicability ‘of Neyman’ 8 type A ‘Ginteibution 
C. I. Buiss and A. R. G. Owen. Negative binomial distributions with a common k 
W. Brass. Simplified methods of fitting the truncated negative binomial distribution 
D. R. Cox. The interpretation of the effects of non- additivity i in the Latin square ‘ 
- J.R. ASHFORD. Quantal responses to mixtures of poisons under conditions of simple similar action= 
the analysis of fincontrolled data A 
P. G. Moorz. Some properties of runs in quality control proedures 
E. J. Witi1ams. Simultaneous regression equations in experimentation 4 ‘ ‘ 
Tuomas 8. RussE~t and RatpH ALLAN BRADLEY. One-way variances in a two-way classification 
R. L. Puackett. Studies in the history of probability and statistics. VII. The Lae: of the 
arithmetic mean : 
R. L. Brown and F. FEREDAY. " Multivariate linear structural relations 2 
Ineram OLKIN. Multivariate ratio estimation for finite populations 
D. E. Barton, F. N. Davip and C, L. Mattows. Non-randomness in a setjuence of two alternatives, 
I. Wilcoxon’s and allied test statistics 
Leo A. GoopMAN. Simplified runs tests and likelihood ratio teats for Markoff chains 
Roy B. Lerentxk. Moment generating functions of quadratic forms in serially correlated normal 
variables 
J. G. Saw. Moments of sample mouiénte ‘of censored samples from a normal ‘population 
“.G. Herpan. The relation between the dictionary distribution and the occurrence distribution of word 
length and its importance for the study of quantitative linguistics . 
J. K. Mackenzie. Second paper on statistics associated with the random disorientation of cubed 
W. A. O’N. Waveu. Conditioned Markov processes 
MISCELLANEA 


Rrra Maurice. Ranking means of two normal populations with unknown variances . 
D. E. Barton and F. N. Davip. Non-randomness in a sequence of two alternatives. IT. Runs test 
Max HALPERIN and 8, W. GREENHOUSE. Note on multiple comparisons for preter: means in the 
analysis of covariance . . : 
H. R. B. Haox. An empirical investigation into the distribution of the F- ratio in ‘samples from 
two non-normal populations . $ ° 

N. L. Jonnson. Theoretical considerations regarding H. R. B. Hack’s system of randomization for 
cross-classifications . . . 

D. E. Barton. On the equivalence of two tests of equality of rate of occurrence in two series of 
events occurring randomly in time . = 

G. Herpan. The mathematical relation between Greenberg’s ‘index of linguistic diversity and 
Yule’s characteristic 


J. E. Kerricnu. Note on a discontinuous probability density ° 

Haratp Brerestrém. A remark on Spearman’s rank correlation coefficient . 

C. W. Cruntes-Ross. Interval estimation for the parameter of a binomial distribution 

A. Horrson. Further critical values for the sum of two variances 

R. D. Lorp. Studies in the history of ia ad and statistics. VIII. De Morgan and the statistical 
study of literary style. . . : 


REVIEWS 


M. G. Kenpatt and W. R. Bucktanp. Dictionary of Statistical Terms > 

U. GRENANDER and M. RosEnsBiatt. Statistical Analysis of Stationary Time Serine 

L. T. Cronsacn and G. C. Greser. Psychological Tests and Personnel Decisions 

P. Lytz. Regression Analysis of Production Costs and Factory Operation . 

W. G. Cocnran and G. M. Cox. Experimental Designs ’ re £ 

VActav Mysttvec. Statistické metody zemédélského a leanického vfahumnictivi (Statistica 
methods of agricultural and forestry research) 

W. Fetter. An Introduction to Probability Theory and its Applications. Vol. I. 

R. M. Turatxt and L. TornuHErm. Vector spaces and Matrices 

D. D. McCracksen. Digital Computer Programming . 

M. H. Prrenne, F. H. C. Marriort and E, F. O’DoHErTy. Individual Differences in Night-\ Vision 
Efficiency . 

P. F, Hooxsr and L. H. Lowatmy-Coox. Life and Other Contingencies. Vol. I. 

OrnerR Booxs RECEIVED . . ° . 


CORRIGENDA . e ° ° ° 


. 
. . 


+ . - 


Printed in Great Britain at the University Press, Cambridge (Brooke Crutchley, University Printer) 











’ 
\ 
\ 
é 





